TRENDS IN MATHEMATICS 



Mathematics and 
Computer Science III 



Algorithms, Trees, 
Combinatorics and 
Probabilities 




Michael Drmota 
Philippe Flajolet 
Daniele Gardy 
Bernhard Gittenberger 

Editors 



Birkhauser 




TRENDS IN MATHEMATICS 



Trends in Mathematics is a series devoted to the publication of volumes arising from con- 
ferences and lecture series focusing on a particular topic from any area of mathematics. Its 
aim Is to make current developments available to the community as rapidly as possible 
without compromise to quality and to archive these for reference. 

Proposals for volumes can be sent to the Mathematics Editor at either 

BIrkhauser Verlag 
P.O. Box 133 
CH-4010 Basel 
Switzerland 

or 

BIrkhauser Boston Inc. 

675 Massachusetts Avenue 
Cambridge, MA 02139 
USA 



Material submitted for publication must be screened and prepared as follows: 

All contributions should undergo a reviewing process similar to that carried out by journals 
and be checked for correct use of language which, as a rule, is English. Articles without 
proofs, or which do not contain any significantly new results, should be rejected. High quality 
survey papers, however, are welcome. 

We expect the organizers to deliver manuscripts in a form that Is essentially ready for direct 
reproduction. Any version of T^X Is acceptable, but the entire collection of files must be In 
one particular dialect of T^X and unified according to simple Instructions available from BIrk- 
hauser. 

Furthermore, in order to guarantee the timely appearance of the proceedings It is essential 
that the final version of the entire material be submitted no later than one year after the con- 
ference. The total number of pages should not exceed 350. The first-mentioned author of 
each article will receive 25 free offprints. To the participants of the congress the book 
will be offered at a special rate. 



Mathematics and 
Computer Science III 

Algorithms, Trees, 
Combinatorics and 
Probabilities 



Michael Draiota 
Philippe Flajolet 
Daniele Gardy 
Bernhard Gittenberger 

Editors 




Springer Basel AG 




Editors: 



Michael Drmota 

Vienna University of Technology 

Institute of Discrete Mathematics 

and Geometry 

Wiedner Hauptstrasse 8-1 0 

1040 Wien 

Austria 

e-mail: michael.drmota@tuwien.ac.at 

Philippe Flajolet 
INRIA Rocquencourt 
78153 Le Chesnay 
France 

e-mail: Philippe.Flajolet@inria.fr 



Daniele Gardy 

Universite de Versailles-St-Quentin 
PRISM 

Batiment Descartes 
45 avenue des Etats-Unis 
78035 Versailles Cedex 
France 

e-mail: gardy@prism.uvsq.fr 

Bernhard Gittenberger 

Vienna University of Technology 

Institute of Discrete Mathematics 

and Geometry 

Wiedner Hauptstrasse 8-1 0 

1040 Wien 

Austria 

e-mail: gittenberger@dmg.tuwien.ac.at 



2000 Mathematical Subject Classification 05-XX, 60C05, 60Gxx, 68P30, 68Q25, 68Rxx, 
68W20, 68W40, 90B15 

A CIP catalogue record for this book is available from the 
Library of Congress, Washington D.C., USA 

Bibliographic information published by Die Deutsche Bibliothek 

Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed 
bibliographic data is available in the Internet at http://dnb.ddb.de 



ISBN 978-3-0348-9620-7 ISBN 978-3-0348-791 5-6 (eBook) 
DOI 10.1007/978-3-0348-7915-6 



The logo on the cover is a binary search tree in which the directions of child nodes alternate 
between horizontal and vertical, and the edge lengths decrease as 1 over the square root of 
2. The tree is a Weyl tree, which means that it is a binary search tree constructed from a Weyl 

sequence, i.e., a sequence (na) mod 1 , n = 1 ,2 where a is an irrational real number. 

The PostScript drawing was generated by Michel Dekking and Peter van der Wal from the 
Technical University of Delft. 

This work is subject to copyright. All rights are reserved, whether the whole or part of the 
material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, 
recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data 
banks. For any kind of use permission of the copyright owner must be obtained. 

© 2004 Springer Basel AG 

Originally published by Birkhauser Verlag, Basel - Boston - Berlin in 2004 

Softcover reprint of the hardcover 1st edition 2004 

Printed on acid-free paper produced from chlorine-free pulp. TCF °o 

ISBN 978-3-0348-9620-7 



987654321 



www.birkhauser-science.com 




Foreword 



These are the Proceedings of the International Colloquium of Mathematics 
and Computer Science held at the Vienna University of Technology, September 
13-17, 2004. This colloquium is the third one in a now regularly established series 
following the first two venues in September 2000 and September 2002 in Ver- 
sailles. The present issue is centered around Combinatorics and Random Struc- 
tures, Graph Theory, Analysis of Algorithms, Trees, Probability, Combinatorial 
Stochastic Processes, and Applications. It contains invited papers, contributed 
papers (lectures) and short communications (posters). 

The contributions have been carefully reviewed for their scientific quality and 
originality by the Scientific Committee chaired by Michael Drmota (Vienna Uni- 
versity of Technology, Austria) and composed of Brigitte Chauvin (Universite de 
Versailles, Prance), Luc Devroye (McGill University, Canada), Daniele Gardy (Uni- 
versite de Versailles, France), Philippe Flajolet (INRIA Rocquencourt, France), 
Michal Karonski (Adam Mickiewicz University, Poland), Abdelkader Mokkadem 
(Universite de Versailles, Prance), Helmut Prodinger (University of Witwatersrand, 
South Africa), J. Michael Steele (University of Pennsylvania, Philadelphia, USA), 
Brigitte Vallee (Universite de Caen, France). We thank them and all anonymous 
referees for their impressive work. 

We also thank the invited speakers: Jean Bertoin (Universite Paris VI, 
Prance), Mireille Bousquet-Melou (Universite Bordeaux 1, Prance), Hsien-Kuei 
Hwang (Academia Sinica, Taiwan), Svante Janson (Uppsala University, Sweden), 
Christian Krattenthaler (Universite Lyon, Prance), Jean-Prangois Marckert (Uni- 
versite de Versailles, France), Boris Pittel (The Ohio State University, USA), Si- 
mon Tavare (University of Southern California, USA), the authors of submitted 
papers and posters, and the participants for their contribution to the success of 
the conference. 

Finally, we express our acknowledgements to the Institute of Discrete Math- 
ematics and Geometry, the Vienna University of Technology, the Federal Ministry 
for Education, Science, and Culture, the City of Vienna, the Austrian Research 
Society (OFG), the Austrian Mathematical Society (OMG), the Goedel-Society, 
and the Bank Austria-Creditanstalt for providing generous financial and material 
support. 



The Organizing Committee 
Bernhard Gittenberger 
Thomas Klausner 
Alois Panholzer 




Preface 



These colloquium proceedings address problems at the interface between 
Mathematics and Computer Science, with special emphasis on discrete probabilis- 
tic models and their relation to algorithms. Combinatorial and probabilistic prop- 
erties of random graphs random trees, combinatorial stochastic processes (such 
as random walks) as well as branching processes and related topics in probability 
are central. Applications are to be found in the analysis of algorithms and data 
structures, the major application field, but also in statistical theory, information 
theory, and mathematical logic. This colloquium is the third one in a now regularly 
established series, following the first two venues in September 2000 and September 
2002 in Versailles. The book features a collection of original refereed contributions: 
contributed papers (lectures) and short communications (posters), supplemented 
by more detailed articles written by invited speakers (and coauthors): Jean Bertoin 
(and Christina Goldschmidt), Svante Janson, and Boris Pittel (and Alan Frieze). 

During the final preparation of this volume we received the sad news that 
Rainer Kemp (from Frankfurt, Germany) has parsed away. Rainer Kemp was one of 
the founding fathers of the Analysis of Algorithms, a main topic of our conference. 
His book Fundamentals of the average case analysis of particular algorithms^ Wiley, 
1984, was one of the first books on this subject and had a considerable influence on 
the development of the field. He was organizer of several meetings on this subject 
and served the scientific community with many other duties. But first of all we 
lost a good friend and colleague. 

Combinatorics and Random Structures. The starting point of many studies 
of random discrete models is combinatorics, which often provides us with exact 
representations in terms of counting generating functions that can also be used for 
a probabilistic study. Sylvie Corteel, Guy Louchard, and Robin Pemantle work on 
the common distribution of intervals in pairs of permutations. Next Sylvie Cor- 
teel, Jeremy Lovejoy, and Ae Ja Yee provide generating functions for generalized 
Frobenius partitions. Luca Ferrari, Renzo Pinzani, and Simone Rinaldi present 
some results on integer partitions. Toufik Mansour considers generating functions 
for 321-avoiding permutations. Eugenijus Manstavicius proves an iterated loga- 
rithm law for the cycle lengths of a random permutation. Martin Rubey provides 
a sufficient condition for transcendence of generating functions of walks on the slit 
plane. Michael Schlosser finds some curious g^-series expansions, and Klaus Simon 
presents relations between the numbers of partitions and the divisor functions. 

Graph Theory. Graphs are a basic object in discrete mathematics. They are 
widely used in applications, and algorithms on graphs as well as theoretical ques- 
tions on graphs have been “modern topics” of research in mathematics and com- 
puter science for several decades. Mindaugas Bloznelis uses Hoeffding decompo- 
sition to prove asymptotic normality of subgraph count statistics. Robert Cori, 
Arnaud Dartois, and Dominique Rossin compute so-called “avalanche polynomi- 
als” for certain families of graphs. The invited paper by Alan Frieze and Boris 
Pittel gives a detailed analysis on perfect matchings in random graphs with pre- 
scribed minimal degree. Omer Gimenez and Marc Noy provide very tight estimates 




Preface 



viii 



for the growth constant of labelled planar graphs. Finally, Stavros D. Nikolopou- 
los and Charis Papadopoulos present an algorithm for determining the number of 
spanning trees in P4-reducible graphs. 

Analysis of Algorithms. This field was created by Donald E. Knuth and is 
concerned with accurate estimates of complexity parameters of algorithms and 
aims at predicting the behavior of a given algorithm. Javiera Barrera and Chris- 
tian Paroissin consider specific search cost in random binary search trees. Monia 
Bellalouna, Salma Souissi, and Bernard Ycart analyze probabilistic bin packing 
problems. Pawel Hitczenko, Jeremy Johnson, and Hung- Jen Huang consider al- 
gorithms for computing the Walsh-Hadamard transform. Tamur Ali Khan and 
Ralph Neininger analyze the performance of the randomized algorithm to evalu- 
ate Boolean decision trees proposed by Smir, in particular they consider the worst 
case input and provide limit laws and tail estimated. Next, Shuji Kijima and To- 
momi Matsui propose a polynomial time perfect sampling algorithm for two-rowed 
contingency tables. Conrado Martinez and Xavier Molinero combine two genera- 
tion algorithms to obtain a new efficient algorithm for the generation of unlabelled 
cycles. Finally, Yuriy A. Reznik and Anatoly V. Anisimov suggest the use of tries 
for universal data compression. 

Trees. Trees are perhaps the most important structure in computer sci- 
ence. They appear as data structures and are used in various algorithms such as 
data compression. David Auber, Jean-Philippe Domenger, Maylis Delest, Philippe 
Duchon, and Jean-Marc Fedou present an extension of Strahler numbers to rooted 
plane trees. Julien Fayolle analyzes mean size and external path length of a suffix 
tree that is related to the LZ’77 data compression algorithm. Eric Fekete considers 
two different kinds of external nodes in binary search trees and describes the evo- 
lution of this process in terms of martingales. The invited paper by Svante Janson 
offers an analysis of the number of records in a complete binary tree or equiva- 
lently the number of random cutting to eliminate a complete binary tree. Interest- 
ingly the distribution is, after normalization, asymptotically a periodic function in 
log n — log log n, where n is the size of the tree. Mehri Javanian and Mohammad Q. 
Vahidi-Asl consider multidimensional interval trees. Anne Micheli and Dominique 
Rossin describe a specific distance between unlabelled ordered trees, that is based 
on deletions and insertions of edges. Katherine Morris determines grand averages 
on some parameters in monotonically labelled tree structures. Tatiana Myllari 
proves local central limit theorems for the number of vertices of a given outdegree 
in a Galton- Watson forest. And finally, Alois Panholzer gives a precise analysis of 
the cost distribution for destroying recursive trees in the case of toll functions of 
polynomial growth. 

Probability. Probabilistic methods get more and more important is the analy- 
sis of discrete structures: random graphs, random trees, average case analysis of al- 
gorithms etc. Margaret Archibald addresses the question of the probability that the 
maximum in a geometrically distributed sample occurs in the first d positions of a 
word. The invited paper by Jean Bertoin and Christina Goldschmidt describes the 
duality between a fragmentation associated to certain Dirichlet distributions and 
a natural random coagulation. This gives rise to an application to the genealogy 
of Yule processes. Mykola S. Bratiychuck considers semi-Markov walks in queue- 
ing and risk theory. Amke Caliebe characterizes fixed points of linear stochastic 
fixed point equations as mixtures of infinitely divisible distributions. Peter Jagers 
and Uwe Rosier describe a systematic approach to find solutions of stochastic 
fixed points involving the maximum. Arnold Knopfmacher and Helmut Prodinger 




Preface 



IX 



provide central limit theorems for the number of descents in samples of geomet- 
ric random variables. Alain Rouault proves a law of large numbers and describes 
a new large deviation phenomenon for cascades. Christiane Takacs investigates 
partitioning properties of piecewise constant eigenvectors of matrices describing 
the mutual positions of points. Vladimir Vatutin and Elena Dyakonova consider 
branching processes in random environment, find asymptotics of the survival prob- 
abilities and prove a Yaglom type limit theorem. Finally, Vladimir Vatutin and 
Valentin Topchii study the joint distribution of the number of individuals at the 
origin and outside the origin on a continuous time random walk on the integers. 

Combinatorial Stochastic Processes. Random walks are the most prominent 
representatives of combinatorial stochastic processes. They play a central role in 
the interplay between combinatorics and probability. Enrica Duchi and Gilles 
Schaeffer consider a model of particles jumping on a row of cells with general 
boundary conditions where the stationary distribution is not uniform. Guy Fay- 
olle and Cyril Furtlehner study stochastic deformations of sample paths of random 
walks. Johannes Fehrenbach and Ludger Riischendorf show that a Markov chain 
that is naturally defined on the Eulerian orientation of planar graph converges to 
uniform distribution. Alexander Gnedin considers regenerative composition struc- 
tures. Jean Mairesse and Frederic Matheus study transient nearest neighbor ran- 
dom walks on groups with a finite set of generators and compute various char- 
acteristics such ais the drift and the entropy. Finally, Philippe Marchal gives a 
fractal construction of nested, stable regenerative sets and studies the associated 
inhomogeneous fragmentation process. 

Applications. Random combinatorics interacts with many other areas of sci- 
ence. Eda Cesaratto and Brigitte Vallee consider numeration schemes, defined in 
terms of dynamical systems and determine the Hausdorff dimension of sets of reals 
which obey some constraints on their digits. Adriana Climescu - Haulica deals with 
large deviation analysis of space-time Trellis codes. David Coupier, Agnes Desol- 
neux, and Bernard Ycart provide a zero-one law for first order logic on random 
images, Nadia Creignou and Herve Daude study threshold phenomena for random 
generalized satisfyability problems. Guy Fayolle, Vadim Malyshev, and Serguei 
Pirogov introduce new models of energy redistribution in stochastic chemical ki- 
netics with several molecule types and energy parameters. Laszlo Gyorfi discusses 
Chernoff type large deviations of Bellinger distance on partitions. Nadia Lalam 
and Christine Jacob address the problem of estimating the offspring mean for a 
general class of size-dependent branching processes. Malgorzata and Wlodzimierz 
Moczurad deal with the problem of decidability of simple brick codes. And finally, 
Joel Ratsaby generalizes Sauer’s Lemma to finite VC-dimension classes of binary 
valued functions. 

Altogether papers assembled in this volume offer snapshots of current re- 
search. At the same time, they illustrate the numerous ramifications of the the- 
ory of random discrete structures throughout mathematics and computer science. 
Many of them, in particular invited lectures, include carefully crafted surveys of 
their field. We thus hope that the book may serve both as a reference text and as a 
smooth introduction to many fascinating aspects of this melting pot of continuous 
and discrete mathematics. 

Michael Drmota 
Philippe Flajolet 
Daniele Gardy 
Bernhard Gittenberger 




Contents 



PART !• Combinatorics and Random Structures 

Common Intervals of Permutations 

Sylvie Corteel, Guy Louchard, and Robin Pemantle 3 

Overpartitions and Generating Functions for Generalized Probenius Partitions 

Sylvie Corteel, Jeremy Lovejoy, and Ae Ja Yee 15 

Enumerative Results on Integer Partitions Using the ECO Method 

Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 25 

321- Avoiding Permutations and Chebyshev Polynomials 

Toufik Mansour 37 

Iterated Logarithm Laws and the Cycle Lengths of a Random Permutation 

Eugenijus Manstavicius 39 

Transcendence of Generating Functions of Walks on the Slit Plane 

Martin Rubey 49 

Some Curious Extensions of the Classical Beta Integral Evaluation 

Michael Schlosser 59 

Divisor Functions and Pentagonal Numbers 

Klaus Simon 69 

PART II. Graph Theory 

On Combinatorial Hoeffding Decomposition and Asymptotic Normality of 
Subgraph Count Statistics 

Mindaugas Bloznelis 73 

Avalanche Polynomials of Some Famihes of Graphs 

Robert Cori, Arnaud Dartois, and Dominique Rossin 81 

Perfect Matchings in Random Graphs with Prescribed Minimal Degree 

Alan Frieze and Boris Pittel 95 

Estimating the Growth Constant of Labelled Planar Graphs 

Omer Gimenez and Marc Noy 133 

The Number of Spanning Trees in P4-Reducible Graphs 

Stavros D. Nikolopoulos and Charis Papadopoulos 141 




Contents 



xii 

PART III. Analysis of Algorithms 

On the Stationary Search Cost for the Move-to-Root Rule with Random 
Weights 

Javiera Barrera and Christian Paroissin 147 

Average-Case Analysis for the Probabilistic Bin Packing Problem 

Monia Bellalouna, Salma Souissi, and Bernard Ycart 149 

Distribution of WHT Recurrences 

Pawei Hitczenko, Jeremy R. Johnson, and Hung- Jen Huang 161 

Probabilistic Analysis for Randomized Game Tree Evaluation 

Tamur Ali Khan and Ralph Neininger 163 

Polynomial Time Perfect Sampling Algorithm for Two-Rowed Contingency 
Tables 

Shuji Kijima and Tomomi Matsui 175 

An EflScient Generic Algorithm for the Generation of Unlabelled Cycles 

Conrado Martinez and Xavier Molinero 187 

Using Tries for Universal Data Compression 

Yuriy A. Reznik and Anatoly V. Anisimov 199 

PART IV. Trees 

New Strahler Numbers for Rooted Plane Trees 

David Auber, Jean-Philippe Domenger, Maylis Delest, Philippe Duchon, and 
Jean-Marc Fedou 203 

An Average- Case Analysis of Basic Parameters of the Suffix Tree 

Julien Fayolle 217 

Arms and Feet Nodes Level Polynomial in Binary Search Trees 

Eric Fekete 229 

Random Records and Cuttings in Complete Binary Trees 

Svante Janson 241 

Multidimensional Interval Trees 

Mehri Javanian and Mohammad Q. Vahidi-Asl 255 

Edit Distance between Unlabelled Ordered Trees 

Anne Micheli and Dominique Rossin 257 



On Parameters in Monotonically Labelled Trees 

Katherine Morris 



261 




Contents 



xiii 

Number of Vertices of a Given Outdegree in a Galton-Watson Forest 

Tatiana Myllari 265 

Destruction of Recursive Trees 

Alois Panholzer 267 

PART V. Probability 

Restrictions on the Position of the Maximum/Minimum in a Geometrically 
Distributed Sample 

Margaret Archibald 283 

Dual Random Fragmentation and Coagulation and an Application to the 
Genealogy of Yule Processes 

Jean Bertoin and Christina Goldschmidt 295 

Semi-Markov Walks in Queueing and Risk Theory 

My kola S. Bratiychuk 309 

Representation of Fixed Points of a Smoothing Transformation 

Amke Caliebe 311 

Stochastic Fixed Points for the Maximum 

Peter Jagers and Uwe Rosier 325 

The Number of Descents in Samples of Geometric Random Variables 

Arnold Knopfmacher and Helmut Prodinger 339 

Large Deviations for Cascades and Cascades of Large Deviations 

Alain Rouault 351 

Partitioning with Piecewise Constant Eigenvectors 

Christiane Takacs 363 

Yaglom Type Limit Theorem for Branching Processes in Random Environment 

Vladimir Vatutin and Elena Dyakonova 375 

Two-Dimensional Limit Theorem for a Critical Catalytic Branching Random 
Walk 

Valentin Topchii and Vladimir Vatutin 387 

PART VI. Combinatorial Stochastic Processes 

A Combinatorial Approach to Jumping Particles II: General Boundary 
Conditions 

Enrica Duchi and Gilles Schaeffer 399 




XIV 



Contents 



Stochastic Deformations of Sample Paths of Random Walks and Exclusion 
Models 

Guy Fayolle and Cyril Furtlehner 415 

A Markov Chain Algorithm for Eulerian Orientations of Planar Triangular 
Graphs 

Johannes Fehrenbach and Ludger Riischendorf 429 

Regenerative Composition Structures; Characterisation and Asymptotics of 
Block Counts 

Alexander Gnedin 441 

Random Walks on Groups With a Tree-Like Cayley Graph 

Jean Mairesse and Frederic Matheus 445 

Nested Regenerative Sets and Their Associated Fragmentation Process 
Philippe Marchal 461 

PART VII. Applications 

Real Numbers with Bounded Digit Averages 

Eda Cesaratto and Brigitte Vallee 473 

Large Deviation Analysis of Space- Time Trellis Codes 

Adriana Climescu-Haulica 491 

A Zero-One Law for First-Order Logic on Random Images 

David Coupier, Agnes Desolneux, and Bernard Ycart 495 

Coarse and Sharp Transitions for Random Generalized Satisfyability Problems 

Nadia Creignou and Herve Daude 507 

Stochastic Chemical Kinetics with Energy Parameters 

Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 517 

Large Deviations of Hellinger Distance on Partitions 

Laszlo Gyorfi 531 

Estimation of the Offspring Mean for a General Class of Size- Dependent 
Branching Processes. Application to Quantitative Polymerase Chain Reaction 

Nadia Lalam and Christine Jacob 539 



Decidability of Simple Brick Codes 

Malgorzata Moczurad and Wlodzimierz Moczurad 



541 




Contents 



XV 



A Constrained Version of Sauer’s Lemma 

Joel Ratsaby 


543 


Index 


553 


Author Index 


555 




Part I 

Combinatorics and Random Structures 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Common Intervals of Permutations 

Sylvie Corteel, Guy Louchard, and Robin Pemantle 



ABSTRACT: An interval of a permutation is a consecutive substring con- 
sisting of consecutive symbols. For example, 4536 is an interval in the permutation 
71453682. These arise in genetic applications. For the applications, it makes sense 
to generalise so as to allow gaps of bounded size ^ — 1, both in the locations and 
the symbols. For example, 4527 has gaps bounded by 1 (since 3 and 6 are missing) 
and is therefore a S-interval of ****4 *5*27**** for 5 = 2. 

After analysing the distribution of the number of intervals of a uniform ran- 
dom permutation, we study the number of 2-intervals. This is exponentially large, 
but tightly clustered around its mean. Perhaps surprisingly, the quenched and an- 
nealed means are the same. Our analysis is via a multivariate generating function 
enumerating pairs of potential 2-intervals by size and intersection size. 

1. Introduction 

Let [n] denote the set {1,2, We are interested in counting the common 

intervals of a pair of permutations. To be precise, if Ga and Gb are two permu- 
tations of [n], we are interested in counting the pairs of intervals (/, J) for which 
Ga{I) = Gb{J)> It is equivalent to count intervals / for which G'^^Ga{I) is also 
an interval. Accordingly, we define 

Definition 1.1. The interval I := [i,i -\- k — 1] C [n] is called an interval of the 
permutation G ifG~^{I) is an interval, that is, if there is a j such that 

G[jJ + k-l] = [i,i-\- k-1]. 

The proper intervals are those whose lengths are at least 2 and at most n — 1. 

Here and throughout, we use vector notation for permutations rather than 
cycle notation, so that (<Ji, . . . ,an) denotes the permutation z i-> (j^ rather than 
the permutation consisting of a single n-cycle. 

Example: Let G be the permutation (3, 1, 2, 4, 5). Then the proper intervals of G 
are[l,2],[4,5],[l,3]and[l,4]. 

When G is a random variable, uniformly distributed over all permutations of 
[n], let Xk denote the number of of intervals of length k of G and let X = 
denote the number of intervals of G. Uno et al [18] compute EA/e*, the following 
easy proposition is proved at the end of this section. 

Proposition 1.2. As n 00 , the distribution of X converges to a Poisson with 
mean 2. 

The number of intervals, or runs of a permutation, was studied in the forties 
by Kaplansky [12] and Wolfowitz [19, 20] from a statistical point of view. See also 
[14] . Recently several algorithms were designed to efficiently enumerate all common 
intervals of permutations [10, 18] and their time complexity is 0{n-\-K) where n is 
the size of the permutation and K the number of intervals. These algorithms were 




4 



Sylvie Corteel, Guy Louchard, and Robin Pemantle 



designed because common intervals have several applications. They relate to the 
consecutive arrangement problem [7]. Genetic algorithms for sequencing problems 
are based on common intervals [13, 15]. In bioinformatics [4, 5, 9, 10, 11], genomes 
of prokaryotes can be modelled as a permutation of genes. A common interval 
is then a set of orthologous genes that appear consecutively, possibly in different 
orders, in two genomes. Therefore common intervals can be used to detect groups 
of genes that are functionally associated [10, 11]. As the annotation of genomes 
is not perfect, the notion of consecutivity in intervals needs to be relaxed. A 
notion of gene teams was defined in [6], where a gene team is a maximal set 
of orthologous genes, possibly occurring in different orders in the two species, but 
separated in each case by gaps that do not exceed a fixed threshold, S, To study 
these, we consider a generalisation of intervals, namely (5-intervals (the previous 
case corresponds to (5 = 1). 

Definition 1.3. The set I C [n] is called a (5-interval of [n] of length k if I is a set 
of integers {ii, . . . , ik} with 1 < — ir ^ d for each 1 < r < k — 1. We call I a 

(5-interval of length k of G if both I and G~^{I) are S -intervals. Proper S -intervals 
are again those of cardinality at least 2 and at most n — 1 . 

Example: G — (3, 1, 2, 4, 5) possesses the 2— intervals: 

{1,2},{1,3},{1,2,3},{2,3},{1,2,3,4},{1,3,4},{2,3,4}, 

{2, 4, 5}, {2, 4}, {2, 3, 5}, {2, 3, 4, 5}, {1, 3, 4, 5}, {1, 2, 4, 5}, {1, 2, 3, 5} 

In [6] a polynomial time enumeration algorithm for gene teams is presented. 
Our notion of (5-intervals removes the maximality constraint, whence the number 
of these may grow exponentially and it is natural to enumerate asymptotically 
rather than enumerating exactly. 

The main purpose of this note is to investigate the asymptotic properties 
of , where this denotes the number of (5— intervals of length A; of a uniformly 

chosen random permutation of [n], and of the total number 

(5-intervals of a random permutation. We are interested in all ^ > 1 but in the 

present manuscript we examine only the case (5 — 2. To reduce the number of 

superscripts, we let Y and Yk denote and respectively. 

The number X^^^ of (5-intervals when S > 1 behaves very differently from X. 
Whereas XisO(l)asn-^oo, with all the contributions coming jfrom short 
intervals, there will typically be many (5-intervals. In fact a thumbnail computation 
produces numbers at in the unit interval (02 0.57939) such that for k ^ an and 

a> ak, the random variable Xk will be typically exponentially large: the number 
of (5-intervals of [n] of size k grows exponentially, the probability of G~^ of one of 
these also being a (5-interval decays exponentially, and the growth overcomes the 
decay when a> ak- 

Seeing that X^^^ grows exponentially in n, it is natural to look at the rescaled 
quantity n~^ logX^"^^. In the next section we compute the annealed mean, namely 
n~^ log EX The term “annealed” means that we first take an expectation over 
the (uniform) measure on permutations. The more interesting quantities are the 
quenched quantities, which refer to the typical, rather than the mean behaviour of 
X^^^. Often one has a so-called lottery effect, meaning that the mean of a quantity 
X comes primarily from an exponentially small number of values that are expo- 
nentially larger than the median value, and that consequently, ElogX < log EX. 
For example, when there is a Gaussian limit law, n~^/^(logX — nfi) N(0, (J^), 




Common intervals of permutations 



5 



then one will typically have a lottery effect. Perhaps surprisingly in light of the 
discussion in section 4 , there is no lottery effect. Our main result, Theorem 4.1 
below, is that for 5 = 2 , we have This shows that as 

n ^ 00, the sequence X := /EX^^^ is tight. The mean of X^^^ is computed in 

the next section, with the remaining sections devoted to the computation of the 
second moment. We start in Section 2 with arguments in the case 5 = 1. 



2. Intervals 



Recall that X^ denotes the number of of intervals of length fc of G and that 
X = denotes the number of intervals of G. Uno et al [ 18 ] computed 



E(X2) 



2(n - 1) 



n 



E(X3) 



; E(Xfc)<^ forfc >4 

n[n " 1) 



Although this was not explicitly stated in [ 18 ], it is not hard to show Propo- 
sition 1.2. 



Proof of Proposition 1.2: Letting X' := Ylk=l we see that 




so X' 0 in probability as n ^ oo. Thus it suffices to show that X2 converges 
to a Poisson of mean 2 . Kaplansky proves this in [ 12 ]. A modern approach is to 
use the Poisson approximation machinery first developed by Chen and Stein, and 
put in an explicit and usable form in [ 1 ]. Given A: G [n — 1 ], let be the event 
that G~^{fc, fc + 1 } is an interval. Let Bk = {fc — 1 , fc, fc + 1} fl [n — 1]. Write pk 
for E{Ak) and pk^i for E{Ak H A^). Theorem 1 of [ 2 ] shows that the total variation 
distance between X and a Poisson with mean '^i^Pk is bounded by the sum of 
three quantities, namely 

bi := 

k jeBk 

b2 ■= Pi^’i 

k k^jeBk 

63 := y^E|E(lA, -pk\cr{Aj :j ^ Bk))\ . 

k 

Bounding 61 and 62 by 0 (l/n) is straightforward. The same bound on 63 follows 
from the identity 

|E ( 1 a, -Pk\<^)\ = 2 sup [P(/f n Ak) - Pk^H)] 

Hea 

= 2pk\\p^Ak - I^Wtv 1 

together with a coupling argument obtaining pAk from p by switching the values 
of G~^ on fc and G(j), and on fc + 1 and G{j + 1) for a uniformly chosen j. □ 



More generally, one may consider the distribution of X^. 

fc!(n — fc -f- 1) fc(n — fc + 1) 



E(X,) 



n(n — 1) . . . (n - fc + 2) 






( 1 ) 




6 



Sylvie Corteel, Guy Louchard, and Robin Pemantle 



Set 



n— 1 



X:=J2^k. 



k=2 



We see that, as n — > oo, the dominant terms of E{X) are given by A; = 0(1) and 
A; = n — 0(1). Indeed, by Stirling, and setting A; = an + 1, we easily derive 



\ 


/■ 1/2 r 




-I 


\k=sn / 


J £ 



|^a^’^(l — a)^^ — a)n\/27ra{l — a)nj nda (2) 

which goes to zero exponentially as n goes to infinity for fixed e. We obtain 

e(x)~2 + ^2| + ?| + ... 

n ri'^ 



As an example of Xk behaviour, let us now turn to X3 and compute := E {X ^) . 

Combinatorial analysis leads to asymptotic series expressions for the moments 
of Xk and for the probabilities E{Xk = j)- For example, 

EX 3 - - - 4 + • • • EA| - - + ^ • 

n n 

and 

F(^3 = 0) ^ 1-- + ^+- - • P(X3 = 1) ^ . . F{Xs = 2) ^ ^ + . . . 

n n 

Details are given in [8]. We turn now to 2-intervals. 



3. The mean number of 2-intervals 



Let N{k,n) denote the number of 2-intervals that are subsets of [n] and have cardi- 
nality k. We will take advantage of the uniformity of G~^. For each of the N{k, n) 
2-intervals of cardinality A;, its inverse image under G is uniformly distributed on 
A;-subsets of [n]. Therefore, the probability is exactly N{k,n)/ (2) for any given 
2-interval of cardinality k, that its inverse image under G is again a 2-interval. 
Thus 



N{k,nf 



o 



(3) 



To evaluate AT(A:, n), note that there may be anywhere from 0 to ruk := 
min{fc - l,n — A;} “holes” in a 2-interval, where a hole is an element not in the 
2-interval but between its endpoints. These may be enumerated by the following 
procedure. Pick a starting position r with l<r<n — k — i-\-l and let r be 
the least element of the 2-interval. Choose any sequence with i occurrences of the 
word “skip” and A: — 1 — i occurrences of the word “no-skip” . If the first word in 
the sequence is “no-skip” then r -h 1 is the next element of the 2-interval; if the 
first word is “skip” then r -h 2 is the next element. Continue in this manner until 
the sequence is used up. This method of enumeration makes it clear that 



N(k,n)^ 



(n 
\k 



(Erio(^+i-fc-*)(Y))' 

(^) 



(4) 




Common intervals of permutations 



7 



The sum may be evaluated exactly for fc/n < 1/2 and asymptotically for fc/n > 1/2 
leading to 



N{k,n) ~ 






-oc, 



2^—1 Vk ^ 

(2/c— 1— n) 



if 



(3fe-l-2n)2 



\n-k) 



k—2n/3 

nV2 
• ^ k—2n/3 

* ni/2 



X 6 (— oo, oo), 

+ 00 , 



( 5 ) 



where is the expectation of the positive part of Z + a:: and Z is a standard 
normal. 

Via equation (4), these asymptotics for N{k,n) lead directly to asymptotics 
for EYk. To obtain asymptotics for EF, we then sum over fc, using a saddle point 
approximation. The only significant terms are near k = where will be 
determined shortly but is evidently greater than 2/3. We therefore use (5) with 
n oo and a := k/n to obtain 

N{k,n)‘^ hk-n / k \ / 2fc - n ^ 
k \n — fcj \3fc — 2n/ 

~ A{a)n~^^^ exp [nFi{a)] 

where 

1 

27ra(l — a) ’ 

Fi{a) := Saloga — (1 — a) log(l — a) — 2(2o: — 1) log(2o; — 1) . 

The maximum of Fi occurs in [2/3, 1] where F[{a) vanishes, which occurs when 
a = ^ .7840013296 solves 



A{a) 



(2a - 1)^ 
(3a - 2)'« 




ITa"^ - 33a^ + 24a^ - 8a + 1 = 0 . 



A saddle point approximation now gives us 



n— 1 



EY = J2 



k=2 



A{k/n)exp[nFi{k/n)]n 

k=3nfA 

24(a*)exp(nFi(a*))2 



2n 






= exp(nFi(a*))< 



/(2a* -1)11 (3a* -2) 



y (3 - 2a,)8 
2.4253 exp(0.40122467n) . 



4. Counting pairs of 2-intervals 

We will now use the second moment method to show that Y/EF is tight as n -> oo 
by establishing: 

Theorem 4.1. EY^ = 0(EY)2. 




8 



Sylvie Corteel, Guy Louchard, and Robin Pemantle 



Just as the mean of V may be obtained from a saddle point analysis of El^: 
near fc/n = a*, we expect the second moment of V to be dominated by terms 
with k near some Because we have seen from numerical data that the 
quenched and annealed behaviour are the same, we expect to find, and do find, 
that o^^n■ — ck** 

Again we will take advantage of symmetry. This time, if / and I' are 2- 
intervals, we will need to know the cardinality of their intersection before we can 
say the probability that and are both 2-intervals. We therefore 

define N{k,k' k) to be the number of pairs of 2-intervals (/,/') of [n] with 
|/| = k,\I'\ = k' and |7 D /'| = k. For the computation of we will want 
to specialise to the case k = fc', so we denote N{k,n,K) := N{k,k,n,K). Our 
computations will now be analogous to the computation Fi and its argmax, 
Specifically, letting 



a := 



k 

n ’ 




n 




we will find a rate function rate(o;, l3, p) such that 



A'(fc, fc', n, ~ Ao(a, (5^ p)n exp(nrate(a, p)) (7) 



for all parameter values in a range containing the dominant contributions to the 
second moment of Y . To obtain the analogue of (6) for second moments, we need 
the rate function for total number of pairs of subsets A and B of [n] with \A\ = 
fc, \B\ = k' and \AC\B\ = k>. The total number is exactly 

/ n \ n! 

k — K,k^ — — k — k' + k) hc\{k — K)\{k' — K.)\{n — fc — fc' + /^)! ’ 

whence for any pair of sets of respective cardinalities k and k' whose intersection 
has cardinality k < k, the probability that their union is a specific pair of sets is 
the reciprocal, P, of this. Let ent denote the exponential rate limn“^ logP for this 
probability: 

ent plogpF{a-p) log{a-p)-\-{(3-p) log(/3-p) + (l-o-^-hp) log{l-a-f3Fp ) . 

( 8 ) 

We may now evaluate the rate function 

F2{a^(3^p) := lim N{k,k' K)^P{k^k\n, k) (9) 

n-^oo,kjn^a,k'ln—i’P,Kfn—^p 



= 2 • rate + ent 

for the expected number of pairs of 2-intervals of respective sizes k and k' with 
overlap k. Since EYj^Yk' = k' k i^)'^P{k, fc', n, n) is the 

sum of polynomially many summands, it follows that the exponential order of EY^ 
is the same as the order of the largest summand, namely 

n“^logEF^ — ^ sup F2{oL,j3,p) := A**. (10) 

a,(3,p 

Furthermore, from the inequality EYkYk' <i(EF,2 + Ey,2), 

we see that the max- 
imum of EYj^Yk' (for fixed n) can only occur when k' = A:, and therefore 

A** = supF2(a,a,p) ■ (11) 

oc,p 

Next, consider how N{k,k\n,K) varies with k for fixed {k,k',n). In other 
words, enumerate pairs of 2-intervals of fixed sizes k and k' according to the size 




Common intervals of permutations 



9 



of their intersection, k,. Observe that k',n, k) counts all pairs of 2-intervals 

of sizes k and k', so that 

N{kj fc', n, k) = N{k, n)N{K, n ) . 



Since the number of summands is linear in n, we have at the exponential level that 

sup rate(a, /?, p) = u{a)u{l3) (12) 

p 

where u{a) = limn“^ Ar(fc, n) as n oo with k/n a. We will show: 

Lemma 4.2. For a and (3 in a neighbourhood of a** ; this supremum occurs at 
p = a^ (3. 



Observe that — ent(o,^, •) has a maximum at a • /? as well, where it is given 
by the expression 

ent(o;, /?, a • /?) = h{a) + h{/3) . (13) 

Here, h{x) = xlog(x) -h (1 — x)log(l — x) is the usual entropy function. Both 
rate and ent are smooth, so we see that the function F 2 = 2 • rate + ent, viewed 
as a function of the third argument, has a critical point at p = a • as well. 
The next lemma, requiring a four-variable generating function and considerable 
computation, is: 



Lemma 4.3. The critical point (a, /3,a' (3) is in fact a maximum for F 2 . 

Corollary 4.4. 

A** = 2Fi(a*). 

Proof. It follows from (12) and (13) that 

supF2{a,(3,p) = F2{a,0,a-/3) 



= 2 • rate(a, /?, a • /3) + ent(a, f3,a- (3) 

— 2u{ol) -\~ 2'u(/3) “h hipt) /i(/3) 

= Fi(a) + Fi(/J). 

Taking the maximum over a and (3 then gives the result. □ 



Theorem 4.1 now follows directly from Corollary 4.4 and: 



Lemma 4.5. 1. There is a neighbourhood of (a*, a*, p) for which 



N {k, k\ n, K,)‘^ P{k, k\ n, k) ~ A{a^(3,p)n ^/^exp(nF2(a,/3,p)) 

uniformly as n ^ 00 with {k/n,k' /n, hi/n) {a,P,p). 

2. The Hessian of F 2 here is non-degenerate, whence 



N{k, k', n, n)‘^P{k, k' , n, k) ~ 

k,k' ,K, 



.A(q;jic, Of*, o^) 



exp(2nFi(a*)) . 



We will prove Lemmas 4.2, 4.3 and 4.5 in the next section by means of the 
aforementioned four-variable generating function. 




10 



Sylvie Corteel, Guy Louchard, and Robin Pemantle 



5, The generating function 



Recall that N{k,k\K,n) denotes the number of pairs of 2-intervals (/,/') of [n] 
with |7| = fc, |/'| = k' and \I fl /'| = k. Define the generating function 

F{ui,U 2, t, z) := ^ N{k, fc', ac, n)uiU2 

k,k' 



SO that N{k, k, n) = [uiU^t^ z'^]F . 

Given a pair of subsets of [n] , the positions 1 through n may be divided into 
three parts: 

1. An initial sequence of positions before the first common position; 

2. A common position followed by zero or more segments of the form: a 

sequence of positions not common to either set, in such a way that no 
two positions in a row are absent from either set, followed by a common 
position; 

3. A final sequence of positions after the last common position. 

We give generating functions for the three parts separately. To enumerate 
the second of the three parts, note that each segment between common positions 
can be one of six possible classes of configuration. We list these here, along with 
the factor contributed by such a step to the generating function. 

(a) Empty, /a = 1. 

(b) A single position, which can belong to either set or neither, but not both, fb = 

z{l -t- + U 2 ). 

(c) A positive number of pairs {j,j -h 1) where j E I \ F and j + I G I' \ I. 

/c = Z^UiU2/{l - Z^UiU2). 

(c') A positive number of pairs (^’, j + 1) where j ^ I' \ I and j + 1 E / \ /^ fc' — fc- 

{d) the same as (c) but there is a single position in / \ 7' at the end. fd = zuifc. 

(7') the same as (c') but there is a single position in /' \ / at the end. fd' = zu2fc' ■ 

The generating function for an arbitrary sequence of these is 



ZUiU2t 

1 - ZUiU 2 t{fa + /fc + (2 + z{ui + U2)fc) * 



(14) 



To enumerate the first (or leist) of the three parts, we first write down the 
generating function 

Fq{Uj Z) := - — ■ r 

1 — (1 -h z)uz 

for that part of a 2-interval between its first and last point. By symmetry, parts 1 
and 3 have the same generating function, which is equal to 1/(1 — z) times the 
generating function g for the segment to the right of the last common position but 
to the left of the last position in 7u7'. We may write g as the sum of several cases. 

(e) Empty. = 1. 

(/) A position in neither set, followed by a non-empty string of positions, each of 
which is in neither set or 7, with no two in a row not in 7. = zFq{u\^z). 

(f) A position in neither set, followed by a non-empty string of positions, each of 
which is in neither set or 7', with no two in a row not in 7'. p// = zFq{u 2 ,z). 

(g) A string of pairs as in case (c) above, followed by a nonempty sequence as in 
case (f). gg = Fe{ui,z)/{1 — z^uiU 2 )> 

{g') A string of pairs as in case (c’) above, followed by a nonempty sequence as in 
case (f’). ggf = Fq{u 2 ,z )/{1 - z‘^uiU 2 )^ 

(h) The same as (g) except with a position in 7' \ 7 in the beginning. 
gh = ZU2Fq{ui,z)/( 1 - Z^UiU2). 




Common intervals of permutations 



11 



{h') The same as (g’) except with a position in / \ /' in the beginning. 
9h' = zuiFe(u2,z)/(l - z^uiU2). 

Summing, we see that the factor from the first and last parts is 



9 = 



1 + zFq{u\^z) + zFq{u2^z) + 



Fe{ui,z) + Fe{ui,z) 



1 — z‘^UiU2 



+ 



U2ZFq{Ui,z) + UiZFq{u2, z) 
1 - z‘^UiU2 



and finally, 



F{ui,U2,t,z) = fg^ . 



(15) 



6. Proofs of lemmas 



There is only room for sketches here. Details are given in [8]. 

Let V be the variety where the rational function F blows up, namely the 
union of the varieties where the denominators of / and g vanish. It simplifies 
the computation to change variables to r := ztuiU 2 , u — zu\ and v = zu 2 > The 
[fc, fc', n, «:] coefficient of F now becomes the [k — k' — K^n — k — k' hi] coefficient 
of the function F(x, y, z, r). Thus 

— log[na, nl3^ n, np]F = — log[n/i, m^, n5, np]F 
n n 



when a = p + p, f3 = iy-\-p,p = aFl3-\-S — 1 and F is F under the change of 
variables. In the new variables. 



r(l - xy) 

1 — xy — r(l + X + y + xy + zxy — z) 



We let h denote the denominator of this expression. Then V is the union of the 
variety Vh where h vanishes, together with the varieties where the functions yi 
1 - 2 :, y 2 •= 1 — xy, ys := 1 - (1 + z)x and y 4 := 1 — (1 + z)y vanish. 

Let Dom denote the domain of convergence for the power series for 



F= ^ arX*" . 

r:={k,k' ,n,K,) 

Let log D denote the logarithmic domain, which is always convex: 
logD:= {log|xi|,...,loglx 4 |) :xE Dom}. 

Asymptotics of Ur in the direction r/|r| ^ s are “controlled” by corresponding 
points x(s) E F as described in the following paragraph (see [8] for details). 

First, it is shown in [16, Theorem 6.3] that for each vector s G there 

is a point x(s) G V with positive real coordinates such that x G 9Dom and 
such that the normal to the support hyperplane to logD at logx is parallel to 
s. It then follows from [16, Theorem 3.5], provided a certain function has a non- 
degenerate Hessian, that if x(s) lies in some smooth component of V for all s in 
some neighbourhood 1^, then there is a function A such that the coefficients of 

E 

r: = {k,k' ,n,n) 



ttrX 




12 



Sylvie Corteel, Guy Louchard, and Robin Pemantle 



are uniformly asymptotically approximated on 3^ by 

Ur ~ A(— , — , — )n“^/^exp(-r • x(r)) . 
^3 n rs 



The key computational step necessary to make use of these results is to verify 
that the point x(r) is in the part Vh of the pole set V that corresponds to the 
vanishing of h. Combinatorially, the factor f in F enumerates pairs of subsets for 
which the first and last common positions are 1 and n respectively. The other 
divisors correspond to the cases where one or both intervals begin at Q{n) or end 
at n - 0(n). We now give an argument that the pairs of sets beginning at 1 and 
ending at n count “nearly all” pairs, meaning that the rate logN is the same. 

Let N{k, fc', n) denote the number of pairs of 2-intervals in which both the 
subsets contains 1 and n. Suppose, for a given (fc, fc', /^, n) that 

W(/c, fc', An, k) > N{k, fc', n, k) (16) 

for some fixed A < 1 and arbitrarily large n (this can happen, for example when 
fi = a — p is small) . Then 

N{k^k\ Xn^ K,)^P{k,k' ^ Xn^ k) > N{k^k'^n^K)^P{k^k\Xn^K) 

> N{k, k' , n, n)‘^P{k, k\ n, n) 

by an exponential factor. But sup^, ^ ^ F 2 (a,^,p) > 0, so if this supremum is 
achieved at o; = A:/n,/3 = fc'/n, p = /^/n, then the inequality would be reversed. 
Thus, in a neighbourhood of where the supremum of F 2 is achieved, it is not 
possible for (16) to hold. Consequently, in this neighbourhood the coefficients of 

/ are, on the exponential scale, as large as those of F, whence the minimal point 
occurs on Vh- 

Proof of Lemma 4.2: Choose a pair of 2-intervals of sizes k and k! uniformly 
from pairs where both contain 1 and n. Define the random variable Zk^k',n fo 
be times the cardinality of the intersection of these two sets. If Zk^k'.n 7 
in probability as n — > cx) with kfri ^ a and k' /n (3^ then the supremum 

of rate(o;, /3, •) occurs at 7 . But we see from the construction of the generating 
function that the indicator functions of the two sets form two independent Markov 
chains, both constrained to go from 1 to n in a given number of steps. This implies 
asymptotically independent statistics, meaning that the limit in probability of 
Zk,k\n is kk' /n, □ 



Proof of Lemma 4.3: We argue here only what we need, namely the lemma for 
(3 — a and o; in a neighbourhood of a^. We have seen that rate and — ent both 
have maxima at (a*, a*, o^^), so this is a matter of checking the second derivative 
of 2 • rate -h ent for negative definiteness and then checking that this local maximum 
is in fact a global maximum for F 2 - To do this, we evaluate the point x controlling 
asymptotics of F 2 {a,a,p). 

Using Maple, we solve for x G and 



{xidhf dxi^ X 2 dhf dx 2 ^ x^dhf dxs, x^dhf 8 x 4 ) 




Common intervals of permutations 



13 



parallel to s for which a = Si/s3,/J = S2/S3, P = S4/S3. We get 

2a - 1 - ii 

= = 2(a + <5-l)’ 

( 2 a-l--R)d 

“ 4(52 _ 6 j + 25R + 4Ja - 6a + 3 + 4a2 - i? ’ 

4(52 _ ^ 25/? + 45a - 6a + 3 + 4a2 - i? 

“ 25a - 5 - 5/? - 4a + 4a2 + 5 + 1 - 2ai? ’ 

here, 5 = 1 -2a-\- p and 

R -.= \/8a2 - 12a + 5 + 85a - 85 + 45^ . 

We take two derivatives and evaluate at p = obtaining a rational function of 
a. Doubling and adding the second derivative of ent, we may verify negativity of 
the second derivative in p for all a > 2/3. This establishes a local maximum at 
p = We have already seen that the global maximum of F2{a,/3,p) occurs at 

^ = a. It is straightforward to check numerically that there is no maximum in a 

set bounded away from the diagonal (3 = a exceeding the value of F2(a,a,a^) at 
a = a^. □ 



Proof of Lemma 4.5: For this we need to compute the solution x(s) in all four 
variables, rather than just when j3 = a. The result is rather messy and may be 
found in [8]. In a neighbourhood of we may use rigorous numerical 

estimates to verify that the Hessian is non-degenerate. This establishes the second 
assertion of the lemma. The first follows from [16, Theorem 3.5]. □ 



References 

[1] R. Arratia, L. Goldstein and L. Gordon. Two moments suffice for Poisson approxi- 
mation: the Chen-Stein method. Annals of Probability, 17:9-25, 1989. 

[2] R. Arratia, L. Goldstein and L. Gordon. Poisson approximation and the Chen-Stein 
method. Statistical Science, 5:403-424, 1990. 

[3] Y. Baryshnikov and R. Pemantle. Manuscript in preparation. 

[4] A. Bergeron and J. Stoye, On the Similarity of Sets of Permutations and Its Applica- 
tions to Genome Comparison, COCOON 2003, Lecture Notes in Computer Science, 
2697: 68-79, (2003). 

[5] A. Bergeron, S. Heber and J. Stoye, Common intervals and sorting by reversals: a 
marriage of necessity. Proceedings of ECCB 2002: 54-63, (2002). 

[6] A. Bergeron, S. Corteel and M. Raffinot: The Algorithmic of Gene Teams. WABI 
2002, Lecture Notes in Computer Science, 2452: 464-476, (2002). 

[7] K. S. Booth and G. S. Lueker, Testing for the Consecutive Ones Property, Interval 
Graphs, and Graph Planarity Using PQ-Tree Algorithms. J. Comput. Syst. Sci. 
13(3): 335-379 (1976) 

[8] S. Corteel, G. Louchard and R. Pemantle. Common intervals in permutations 
(unabridged version). 

http : //www .math, upenn . edu/~pemELntle/louchcLrd/version040122 .ps 

[9] G. Didier, Common Intervals of Two Sequences. WABI 2003, Lecture Notes in Com- 
puter Science, 2812: 17-24 (2003). 




14 



Sylvie Corteel, Guy Louchard, and Robin Pemantle 



[10] S. Heber and J. Stoye, Algorithms for Finding Gene Clusters. WABI 2001, Lecture 
Notes in Computer Science^ 2149: 252-263, (2001). 

[11] S. Heber and J. Stoye, Finding All Common Intervals of k Permutations. CPM 2001, 
Lecture Notes in Computer Science, 2089: 207-218, (2001). 

[12] I. Kaplansky. The asymptotic distributions of runs of consecutive elements. Annals 
of Mathematical Statistics, 16:200-203, 1945. 

[13] S. Kobayashi, I. Ono and M. Yamamura, An Efficient Genetic Algorithm for Job 
Shop Scheduling Problems. ICGA 1995: 506-511, (1995). 

[14] V.K. Kolchin, A.S. Seveistyanov, and P.C. Chistiakov. Random Allocations. Wiley, 
1978. 

[15] H. Miihlenbein, M. Gorges- Schleuter, and O. Kramer. Evolution algorithms in com- 
binatorial optimization. Para//e/ Comput., 7:65-85, (1988). 

[16] R. Pemantle and M. Wilson Asymptotics of multivariate sequences, part I: smooth 
points of the singular variety. J. Comb. Theory, Series A, 97:129-161, 2001. 

[17] R. Pemantle and M. Wilson Asymptotics of multivariate sequences, part II: multiple 
points of the singular variety. Combinatorics, Probability and Computing, to appear. 

[18] T. Uno and M. Yagiura, Fast Algorithms to Enumerate All Common Intervals of 
Two Permutations. Algorithmica 26(2): 290-309 (2000). 

[19] J. Wolfowitz. Additive partition functions and a class of statistical hypotheses. An- 
nals of Mathematical Statistics, 13:247-279, 1942. 

[20] J. Wolfowitz. Note on runs of consecutive elements. Annals of Mathematical Statis- 
tics, 15:97-98, 1944. 

Sylvie Corteel 

CNRS PRiSM, Universite de Versailles Saint-Quentin, 45 Avenue des Etats-Unis, 
78035 Versailles France 
email: syl@prism.uvsq.fr 

Guy Louchard 

Universite Libre de Bruxelles, Departement dTnformatique, CP 212, Boulevard du 

Triomphe, B-1050 Bruxelles, Belgium 

email:louchard@ulb.ac.be 

Robin Pemantle 

Department of Mathematics, University of Pennsylvania, 209 S. 33rd Street, 
Philadelphia, PA 19104 USA, Supported by NSF Grant # DMS-0103635 
email:pemantle@math. upenn.edu 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Overpartitions and Generating Functions for 
Generalized Frobenius Partitions 

Sylvie Corteel, Jeremy Lovejoy, and Ae Ja Yee 

ABSTRACT: Generalized Frobenius partitions, or F-partitions, have re- 

cently played an important role in several combinatorial investigations of basic 
hypergeometric series identities. The goal of this paper is to use the framework of 
these investigations to interpret families of infinite products as generating func- 
tions for F-partitions. We employ q-series identities and bijective combinatorics. 



1. Introduction 

Let Pa,bM denote the number of generalized Frobenius partitions of n, i.e., the 
number of two-rowed arrays, 

^1 5 ^25 • • • ? 

5 ^25 • • • 7 

in which the top (bottom) row is a partition from a set A (B), and such that 
-hbi)Fm = n [2]. The classical example is the case Pd,d(^), where B is the 
set of partitions into distinct non- negative parts. Frobenius observed that these 
objects are in one-to-one correspondence with the ordinary partitions of n, giving 

oo oo ^ 

E'’o,«Wl" = Il7TT^' (2) 

n=0 n=l ^ ^ ^ 

Andrews [2] later made an extensive study of two infinite families of F- 
partitions that begin with Pd,d{^)^ replaced D by Dk or (7^, the set of par- 
titions where parts repeat at most k times and the set of partitions into distinct 
parts with k colors, respectively. The generating functions are multiple theta series, 
which in three known cases can be written as an infinite product. 






n=0 

oo 



iq)Uq^-,q^) 



, 2\2 

oo 



^ / . 2\ 



n=0 



Here we have employed the standard notation 



(fli , . , . , )oo • — (^1 5 • • • ? 1 ^)oo * — ll (1 ^iq )***(! ^jq )• 

fc =0 



( 3 ) 

( 4 ) 

( 5 ) 

( 6 ) 




16 



Sylvie Corteel, Jeremy Lovejoy, and Ae Ja Yee 



While Andrews’ families subsequently received quite a bit of attention [10, 
13, 15, 16, 18, 20], other types of Frobenius partitions have recently been turning 
up as novel interpretations for some infinite products that figure prominently in 
basic hypergeometric series identities [6, 7, 8, 21], The combinatorial setting here 
is that of overpartitions, which are partitions wherein the first occurrence of a part 
may be overlined. Let O denote the set of overpartitions into non-negative parts. 
Then it turns out that 



PD,oim,n)b^q'^ 

m,n>0 



(-bq)oo 

(q)oo 



(7) 



and 



Y] Po,o(£,m,n)a^b^q'^ 

£,m,n>0 



(-aq, -bq)oo 

(q,abq)oo 



(8) 



Here Pd,o{'^^ denotes the number of F-partitions counted by PD,o{n) that have 
m non-overlined parts in the bottom row, and Fo^o(^, m,n) denotes the number 
of objects counted by Po,o(^) that have £ non-overlined parts in the top row and 
m non-overlined parts in the bottom row. 

Since a thorough combinatorial understanding of (7) and (8) has been so 
useful, we give in this paper a variety of other infinite product generating functions 
for F-partitions and begin to study them using bijective combinatorics. The first 
goal is to use restricted overpartitions and a useful property of the iipi summation 
(see Lemma 2.2) to embed some of (2) - (8) in families of infinite products that 
generate F-partitions. 



Theorem 1.1. Let Ok be the set of overpartitions where the non-overlined parts 
occur less than k times. Let Pok,Ok Pok,o{'f^i ) be the number of F- 

partitions counted by Pok.Oki'^) ('^esp. Pof^^oi'^)) wherein the number of overlined 
parts on the top minus the number of overlined parts on the bottom is m. Then 



Y2 = 

m,n>0 



{-bq)^{-q/b)oa{q'"\ g^)oo 
{q)lo{-bq’^,-q q^)oo 



(9) 



Y2, Po,,o(m,n)b’^q'^ 

m,n>0 



{-bq)oc{-q/b)oo{q'^; q’^)^ 

iq)‘Li-q''/f>-, q^)oo 



( 10 ) 



Notice that in both instances the case k oo is the case a — 1/6 of (8), 
while the case fc = 1 of (9) is Frobenius’ example (2), the case 6 = 1, fc = 2 of (9) 
is Andrews’ (5), and the case A; = 1 of (10) is (7). 

Our next object is to exhibit more families like those above, but where the 
base cases are none of (2) - (8). We use the notation AB for the set of vector 
partitions {Xa,^b) G Ax B, and for the set of partitions into non-negative 

parts where each part occurs 0 or fc times. 



Theorem 1.2. 



Y.Po,.OD-{n)q^ 

n=0 



n=0 



{-q)lo{q^\q’^)^{q’"',q'^^)lo 

{q)lo{q’'A^'^'^q^^)oo 



( 11 ) 



( 12 ) 




Overpartitions and Probenius partitions 



17 



Then, by employing more general g^-series identities, we find generating func- 
tions with more parameters, like Theorems 1.3 - 1.6 below. The first two contain 
the k = 1 case of (11) and the case A; = 2 of (10), respectively. 

Theorem 1.3. Let Pd,od(^j^) be the number of F -partitions counted by Pd,od{'^^) 
that have m parts in X^-Then 

E d / \ m n {~Q'^Q)oo{-yQ]Q^)oo /.o\ 

PD,OD{m,n)y^q^ = . (13) 

rr, [Q^QjOO 



Theorem 1.4. Let denote the number of overpartitions in O where the non- 
overlined parts repeat an even number of times. Let denote the 

number of F -symbols counted by Po,o^d{'^) where i is the number of non- overlined 
parts in the top row minus the number of parts in Xjj and m is the number of non- 
overlined parts in the bottom row. Then 



£,m,n>0 



{-aq)oo{~q/a, q^)oo 
{q)oo{q, a26V;9^)oo 



(14) 



The next theorem also contains the case k = 2 of (10). We are concerned 
here with 5, which denotes the set of overpartitions into distinct parts such that 
parts have to differ by at least two if the bigger is overlined and 0 does not occur. 
These overpartitions have recently arisen in a number of works [5, 14, 17]. 



Theorem 1.5. Let P OD, Q(^,m,n) denote the number of overpartitions counted by 
PoD,o(^) ^b>at £ is the number of non-overlined parts in Xq plus the number 
of overlined parts in X^ and such that m is the number of non-overlined parts on 
the bottom minus the number of parts in X^ . Then 



X] PoD,oi^^m,n)a^b^q'^ 

£,m,n>0 



{-aq,-bq)oo{-q/b-,q^)oo 

{q,abq)oo{q-, q‘^)oo 



(15) 



The last example contains (8) and deals with O', the set of overpartitions in 
O that have no 0. 



Theorem 1.6. Let Poo',o (^?^5 ^? ^) denote the number of F -partitions counted by 
Poo',o{p) where k is the number of non-overlined parts in Xq plus the number of 
overlined parts in Xq' , £ is the number of non-overlined parts in the bottom row, 
and m is the number of parts in Xq' • Then 



Poo',o{k,l,m,n)a'^b^c"^q'^ 

k,£,m,n>0 



{-aq, -bq, -cq)^^ 
{q, abq, bcq) 

OO 



(16) 



Finally, we give bijective proofs for some of the generating functions above. 
We are able to establish (5), (13), and the case fc = 2 of (10) in this way. 



2. Recollections and Proofs 

Given a set A of partitions we denote by Pa{'^^ k) the number of partitions of n 
from the set A having k parts. We recall from [2] that 




18 



Sylvie Corteel, Jeremy Lovejoy, and Ae Ja Yee 



Lemma 2.1. The generating function for Frobenius partitions is given by 

oo 

5; PA,s(n)9” = [2°] 5; PA{n, k)q-{zqf Ps(n, k)q^z~'^, (17) 

n~0 n,k n,k 

where [z^] Y^AnZ'^ — Ak- 

We assume enough familiarity with the elementary theory of partitions and 
overpartitions [1, 8] that we can state generating functions for simple PA{n,k) 
without explanation. The following is the key lemma mentioned in the introduc- 
tion. 

Lemma 2.2. If 

0 J-bzq,-l/bz)^ {-bq,-q/b)^ 

> (»,.!/.)„ = [,)%, 

then 

roA-bzq,-l/bz)oo^,k^k^ {-bq,-q/b)oc 

Proof. Let H{q) = [z°]F{z,q)G{z,q), with F{z,q) = J2^j(q)^^ stud G{z,q) = 
Y.Bj{q)zF If Akj{q) = Aj{q^), then 

[z^]F{z,q)G{z>^,q’^) = 

= Y,A-j{q>^)Bj{q'‘) 

= H{g% 

The proof is finished when we apply the above observation to 

Substituting a = If b and z = bz in the ixpi summation, 

{-l/a)n{azqr ^ {q,abq, -zq, -1 /z)oq 
{- bq)n {-bq, -aq, azq, b/z)^ ’ 

we have 



{q)l. 

and the lemma follows 



{-bq,-qlb,zq,llz)oo' 

-bq, -g/fe)oo r,.0irn/.. (~6g, -g/b)oo „,_k^ 



Above we have introduced the notation 



(<^)n • — 0)n • — 



{aq^;q)oo' 




Overpartitions and Probenius partitions 



19 



Proof of Theorem 1.1. For the first part, take G(z, q) = {zq, l/^)oo in Lemma 2.2. 
By (2), H{q) = {q)ool{-bq)oc{-q/b)oo- Then 



n=0 



{zq, l/z)oo 

0 {-bzq,-l/bz)oo^rk fcx 



{zq, 1/z), 



-G{z\q^) 



{-bq, -q/b)^ 

{q)lo 



-H{q^) 



{-bq)oo{-q/b)oo{q^-q^)c 
{q)lo{-bq^,-q^lb-, q^)oc 



Similarly, take G{z,q) = {zq)oo for the second part. □ 

In the following we will use the I'lpi summation (21) or one of its corollaries 
for the first step of each proof. 

Proof of Theorem 1.2. For the first part, take G{z,q) = {-l/zjZq)oo in the case 
6 = 1 of Lemma 2.2. Then 



m 



{q)lo r 01 i-Zq-,q)oo{~Z ^■,q)oo{-Z 

i-q,q)V ^ {z-^-,q)oo 

r^oi {q-,q)oc ^ ^ z-^qi^) 

(-g;g)oc f^ g"' 

(g;g)oo “(g^;g^)« 

(g; g)oo(-g;g^)oo 

( g) g)oo 



by the ^-binomial theorem. 



_ (n-^)oo 

„=0 



(24) 



For the second part we again apply the case 6 = 1 of Lemma 2.2, this time with 
G(z,q) = {z~'^,zq)oo- Then 



H{q) = 



(g) 



oo r_0 



[z^]{-zq-,q)oo{-z ^g)oo(--z ^g)c 



z'’] ^ zn ^ n { n + l )/2 ^ 



n€Z 



i~q,q) 

(g)oo 

i-q)lo 

(g)oo ^ g” 

(-^)^ h 

(g)oo 

(-g)?o(g.g'‘;g®)oo’ 



ry — n ^ n ( n — l ) f 2 

n{n+l)/2 ^ g ' 



n>0 



{q)r 




20 



Sylvie Corteel, Jeremy Lovejoy, and Ae Ja Yee 



by the first Rogers-Ramanujan identity, 



“(5)n {q,q^;q^)oo' 



( 25 ) 

□ 



Proof of Theorem 1.3. 



Y PD,OD{m,n)y'^q'^ = 

m,n>0 



oA-zq\q)oo{-z ^;q)oo{-yz ^g)c 



{z-^',q)c 



= Z 



oi(-q>q) 



oo 

CX) ^ 



E 



y^z 



(9; 9) 

(-g;g) 

{q-,q)oo 

{-q\q)oo{-yq;q^ 



(n+l) ^ 

{-q',q)n {q-,q) 



y q 



T — 

^ (rfi- 



iq^ q)oo 

the final equality being the case q = q^.z = —q/a, and a — > oo of (24). 

Proof of Theorem 1.4. 



□ 



P ^ - r,oi(z£«iZlZ£lZl(M 

2 ^ Po,o^D{i,m,n)a b q [z ] ^2). 



£,m,n >0 



oo 

oo 



r oi(-«9,-Moo (-l/a)„(a2:g)" (l/a6)„(-fi/^)’' 

= 1-r-^^ 2.^ 7 ^ 2^ - 



{q, abq) 






n 



n=0 



{q)n 



{-aq,-bq)^ ^ (-l/o, l/afi)„(-a6q)^ 



E 



{q,abq)^ ^ {-bq,q)n 

^ {-aq)oo{-q/a, -ab'^q^-, g^)oo 

iq)oo{q,a%‘^q‘^-,q^)oo 

by the ^-Kummer identity, 

^ (q,b)n(-g/ 5 )” _ {aq,aq^/b‘^-,q^) 

^ {q,aq/b)n 

Proof of Theorem 1.5. 



Y PoD,oi^^m,n)a‘b’^q 

£,m,n >0 

0i (-«9, -ft?) 



{-q/b, aq/b)oo{q;q^)oo' 

0i (-^9, -1/2)00 (-a^)„g"(”+^)/2(z/6) 



(26) 



{azq,b/z)c 



E 

n=0 



(«)r 






(9, a^9)c 



^ {-llbUbIzr yo (-ag)„g»("+i)/2(z/6)^ 

^ (-09)r ^ 



Tl^Z 



71=0 



(9). 



(- ag , - 6 q)oo ^ (- l / 6 )„ q '"("+^^/2 



(9, «^>9)c 



n=0 



(^)r 



(-gg, -bq)oc{-q/b-, q^)^ 
{q, abq)^{q; q'^)^ 




Overpartitions and Probenius partitions 



21 



the final equality being Lebesgue’s identity which is the case 6 — > oo and a = —1/6 
of (26). □ 

Proof of Theorem 1.6. 



Poo',oik,£,m,n)a%^c"'q'^[z^] 



{-zq, -l/z, -aczq^)oc 
(azq, b/z, czg)oo 



^ r 0| {~m, -bq)oo yy {-llb)n{b(z)^ ^ (-gg)n(czg)" 

^ {q,abq)oo {-aq)n {q)n 

^ (-gg, -bq)oo ^ (-l/6)n(bcg)" 

{q, abq)oo {q)n 

^ (-gg, -bg, -cg)oo 

(g, abq, bcq) 

CO 

by the g-binomial theorem (24). □ 



It should be noted that Theorems 1.4 - 1.6 have fc-generalizations like The- 
orems 1.1 and 1.2 but the combinatorial definition of the F-partitions are less 
palatable. 



3. Bijections 

In this section we establish some partitions that explains some of the first cases. 



Bijection for (5) We will here give a combinatorial proof of 



CO 



y]Po2,02Wg" 

n=0 



(-g;g)oo(-g;g^)^ 
(g; g)oo 



We start with a F-partition and add one to each entry of the first row. Let k be 
the number of overlined parts in the first row minus the number of overlined parts 
of the second row. Suppose without loss of generality that k>0. Then we split the 
F-partition into two F-partitions : one that contains the overlined parts and one 
that contains the non-overlined parts. Apply Wright’s bijection (see [20]) to both 
F-partitions and get two ordinary partitions and two triangles (fc, fc — 1, . . . , 1) and 
(fc — 1, . . . , 1). We keep the first partition, which gives 1 / (g; g)oo and the odd parts 
of the second partition, which gives l/(^; ^^)oo. Then we divide the even parts of 
the second partition by 2. To the left of these parts we put k,k — By 

applying the reverse of Wright’s bijection, we obtain two partitions into distinct 
parts. We multiply by two and decrease by 1 the parts of the first partition and 
we multiply by 2 and increase by 1 the parts of the second partition. We get two 
partitions into distinct odd parts, which is generated by (— ^;^^)^. It is easy to 
check that the weight is preserved and that every step is reversible. 



Bijection for (10). Now let us prove combinatorially that 

oo 



(-g;g)oo(-g;g^)oo 
'2.0 (njg = 



n=0 



(g;g)oo(g;g^)c 



This proof is inspired by some ideas of [21]. We start with the Frobenius partition 
and add one to each entry of the first row. The top row is an overpartition in 




22 



Sylvie Corteel, Jeremy Lovejoy, and Ae Ja Yee 



uip nut ovcriincd 
n ports 



fTfi 






Ovcrpartilion 
into (xid parts 



Figure 1. Bijection for 10 



O2. Therefore the non-overlined parts form a partition into distinct parts. Let us 
suppose that this top row has n non-overlined parts and k overlined parts. We 
separate these into partitions a and (3. The bottom row is an overpartition into 
n k non-negative parts. Using algorithm Z [4, 22], we can decompose it into a 
partition 6 into parts at most n + fc and a partition 7 into non-negative distinct 
parts less than n-\- k. 

We take the parts of 7 that are less than n, conjugate them, and add them to 
a to create an overpartition into distinct parts (a part is overlined if the difference 
with the previous part is at least 2). Then we change a into an overpartition into 
odd parts generalizing Sylvester’s bijection [19]. First we take off 2 [(n— 1)/2J +2— i 
from the part and change it to ry a (2[(n — 1)/2J + 1) x \n/2] rectangle. Then 
we look at the conjugate of what is left. Every odd part is inserted in r/ and every 
even part 2i is changed to two i parts that are inserted in the conjugate of rj. The 
overlines follow. Note that this part is a bijective proof of Lebesgue’s identity. 

Now we take the parts of 7 that are equal to or greater than n. If the part 
— l occurs we add it to the part of f3 and take off the overline. After sorting, 
that creates an overpartition where the non-overlined parts are greater than n-\-k. 
We add then 5 and get an overpartition, which is generated by (— ^;^)oo/(^;^)oo- 



For example, we start with 



8 ^ 6 , 4 , 4 , 3 , 2 , 1 , 0,0 
10,9,5,3,3,1,1,0,0 



. We get a = (7, 5,4,3, 1), 



(3 = (9,5,2, 1), 7 = (7,6,4, 1) and 5 = (6,4,4). Then we apply the mapping and 
get rj = (11,9, 5, 1, 1) and /i == (11, 9,9, 6,4,4, 1). See Fig 1. 



Bijection for (13) We will here give a combinatorial proof of 

P _ {-Q-,Q)oo(-q]q^)oo 

2^PD,OD{n)q - 7— r . 

n=0 

We start with a Frobenius partition and add one to each entry of the first 
row. The bottom row is made of an overpartition (3 and a partition into distinct 
parts a. Let n be the number of parts of a. We apply a generalization of Wright’s 
bijection to the top row and 13. We draw the Ferrers diagram of the top row and 
we shift the part by i — 1. We draw the Ferrers diagram of the non overlined 
parts of P and put it at the left of the diagram of the top row. We draw the Ferrers 
diagram of the overlined parts of /?, conjugate it and put it at the bottom right of 
the diagram of the top row, as shown on the left of Figure 2. Then we break the 
diagram into two parts: the left and the right of the largest overlined part. On the 
right we get an ordinary partition, which gives l/(^; q)oo and on the left we get a 
partition A into distinct parts where all the parts from 1 to n occur. 




Overpartitions and Frobenius partitions 



23 





Figure 2. Bijection for (13) 



Let i be the index of the smallest even part in a. Then we take off i from 
A and add it to the conjugate of a. We do that until a has only odd parts. We 
finally get A a partition into distinct parts, which is generated by {—q,q)oo and a 
a partition into distinct odd parts, which is generated by {—q',q)oo- Each step is 
easily reversible. 



11,10,8,7,6,4,3,2,1 \ ^ , 

(4, 3, 3, 2,0), (7, 4, 2,1) j we get /? = (5, 5, 4, 4, 

4, 3, 3, 3, 3, 2, 2), A = (9, 8, 4, 2) and a = (9, 5, 3, 1). See Fig 2. 



For example, starting with 



Bijection for (15). A combinatorial proof of 



X^^oo',o(n)g" 

n=0 



(-9;9)L(-g;g^)oo 

{Q'^Q)loiT,q‘^)oo 



can be easily done mixing the combinatorial proof of the I'lpi summation of [21] 
and a combinatorial proof of the Lebesgue’s identity for example the one used in 
the previous bijection. 



Bijection for (16). A combinatorial proof of 



Poo',o{ri)q^ = 

n=0 



(-g;g)L 
(q;Q)lo ’ 



can be easily done mixing the combinatorial proof of the I'l/ji summation of [21] 
and a combinatorial proof of the g- binomial identity (see for example [12]). 



Acknowledgments 

We would like to thank Bruce Berndt for making [15] available to us. 



References 

[1] G.E. Andrews, The Theory of Partitions, Cambridge University Press, Cambridge, 
1998. 

[2] G.E. Andrews, Generalized Frobenius partitions, Mem. Amer. Math. Soc. 49 (1984), 
no. 301. 




24 



Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 



[3] G.E. Andrews, ^'-Series: Their Development and Application in Analysis, Number 
Theory, Combinatorics, Physics, and Computer Algebra, CBMS 66, American Math- 
ematical Society, Providence, 1986. 

[4] C.E. Andrews and D.M. Bressoud, Identities in combinatorics, III: Further aspects 
of ordered set sorting. Discrete Math. 49 (1984), 223-236. 

[5] D. Corson, D. Favero, K. Liesinger, and S. Zubairy, Characters and ^-series in Q(\/2), 
preprint. 

[6] S. Corteel, Particle seas and basic hypergeometric series, Adv. Appl. Math., 31 (2003), 
199-214. 

[7] S. Corteel and J. Lovejoy, Frobenius partitions and the combinatorics of Ramanujan’s 
1-01 summation, J. Comhin. Theory Ser. A 97 (2002), 177-183. 

[8] S. Corteel and J. Lovejoy, Overpartitions, Trans. Amer. Math. Soc, to appear. 

[9] N.J. Fine, Basic Hypergeometric Series and Applications, American Mathematical 
Society, Providence, RI, 1988. 

flOl F. Carvan, Partition congruences and generalizations of Dyson’s rank, PhD Thesis, 
Penn State, 1986. 

[11] C. Casper and M. Rahman, Basic Hypergeometric Series, Cambridge University 
Press, Cambridge, 1990. 

[12] J.T. Joichi and D. Stanton, Bijective proofs of basic hypergeometric series identities. 
Pacific J. Math. 127 (1987), 103-120. 

[13] L.W. Kolitsch, Some analytic and arithmetic properties of generalized Frobenius 
partitions, PhD thesis, Penn State, 1985. 

[14] J. Lovejoy, Cordon’s theorem for overpartitions, J. Comhin. Th. Ser. A 103 (2003), 
393-401. 

[15] Padmavathamma, Studies in generalized Frobenius partitions, Ph. D. Thesis, Univ. 
of Mysore, 1985. 

[16] J. Propp, Some variants of Ferrers diagrams. J. Comhin. Theory Ser. A 52 (1989), 
no. 1, 98-128. 

[17] J.P.O. Santos and D.V. Sills, q-Fe\\ sequences and two identities of V.A. Lebesgue, 
Disc. Math. 257 (2002), 125 - 143. 

[18] J. Sellers, New congruences for generalized Frobenius partitions with two or three 
colors. Discrete Math. 131 (1994), 367-373. 

[19] J.J. Sylvester, A construtive theory of partitions in three acts, an interact and an 
exodion, in Collected Math. Papers, vol. 4, pp. 1-83, Cambridge Univ. Press, London 
and New York, 1912; reprinted by Chelsea, New York, 1974. 

[20] A.J. Yee, Combinatorial proofs of generating function identities for F-partitions, J. 
Comhin. Theory Ser. A 102 (2003), 217-228. 

[21] A.J. Yee, Combinatorial proofs of Ramanujan’s i0i summation and the g^-Causs 
summation, preprint. 

[22] D. Zeilberger, A ^-Foata proof of the g^-Saalschutz identity, European J. Comhin. 8 
(1987), 461-463. 

Sylvie Corteel 

CNRS, PRiSM, Email: syl@prism.uvsq.fr 

Jeremy Lovejoy 

CNRS, LaBRI, Email: lovejoy@labri.fr 

Ae Ja Yee 

Pennsylvania State University, Email: yee@math.psu.edu 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Emimerative Results on Integer Partitions 
Using the ECO Method 

Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 

ABSTRACT: In this paper we apply the ECO method to the study of some 
enumerative properties of integer partitions. In so doings we both give an original 
description of some known constructions regarding partitions and propose some 
results, especially in the context of generalized hook partitions (i.e., partitions 
whose Ferrers diagrams fit inside of a suitable hook shape). 

1. Introduction 

The main goal of this work is to present an alternative approach to the study 
of enumerative properties of integer partitions. Namely, we are going to apply 
the ECO^ method in order to effectively construct integer partitions. After briefly 
recalling the basics of the ECO method, with particular emphasis on those notions 
we will need in the sequel, we give a general ECO construction for partitions. We 
then modify such a construction to deal with restricted classes of partitions. Next 
we propose a second (less classical) construction of partitions and make use of it 
to propose a bijective approach to Lecture Hall partitions, leading to a challenging 
open question. Finally, we state some new results on generalized hook partitions, 
also proposing an open problem on partitions fitting into the intersection of two 
hooks having different shapes. 

Our notations will be quite classical. In the partition A = (pi,...,pi) the 
parts are in decreasing order, that is pi > P 2 > • * • > p/. If A is a partition of 
n, then we write A h n or |A| = n. The whole class of integer partitions will be 
denoted T. 

2. Background 

We call ECO system a purely formal system of the kind 

(fc)-(ei(fc))(e2(A:))-..(efe(fc)) ’ (1) 

where a, fc, ei(fc), . . . , ek{k) are positive integers, (a) is called the axiom of the ECO 
system; the second row is just a shortcut in order to express a set of productions of 
the denoted form, for k running over a (finite or infinite) set of positive integers. 
Thus, for example, the following are typical examples of ECO systems: 



(*) : 


1 (2) 

\ (fc)^(2)(3)(4)--.(fc)(fc + l) > 






(**) : 


{ [k) ^ + ’ (***):■ 


f (2) 

(1) - (2) . 
1 (2) - (1)(2) 


(2) 



^We recall that ECO stands for Enumeration of Combinatorial Objects. 




26 



Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 



Here we tacitly assume that the ECO systems (*) and (**) have, as labels, 
all the positive integers (greater than 2), whereas the ECO system (***) contains 
the only labels (1) and (2). 

Among the many ways of interpreting an ECO system, a well-known and pic- 
torial one is that of drawing its generating tree, which is, by definition, the infinite, 
rooted, labelled tree whose root is labelled (a) (the axiom) and such that every 
node labelled (k) produces exactly k sons, labelled respectively (ei(fc)), . . . , {ek{k)). 
For instance, the generating tree related to the ECO system (*) looks as follows: 




Figure 1. The first 4 levels of the generating tree of (*). 

Prom an enumerative point of view, one of the main information encoded 
by an ECO system is the numerical sequence (/n)neN such that fn is the num- 
ber of nodes at level n in the generating tree related to the ECO system (the 
concept of level in a generating tree like that in Fig. 1 is straightforward). Actu- 
ally, it often happens that ECO systems appear as coding of special combinatorial 
constructions, in which the objects of a given class are recursively constructed by 
performing a sort of local expansion on them. This kind of combinatorial construc- 
tion is the core of the ECO method, for which we refer the reader to the paper 
[BDLPP] introducing it. 

Throughout our work we will be mainly concerned with some modifications 
of the classical concept of ECO system (or succession rule, as it is often called) 
given above. The first generalization we need is that of mixed succession rule. 

We start by introducing the concept of mixed generating tree: it is a rooted, 
labelled tree whose edges can have different “lengths” (whereas in the classical 
case the length of each edge is equal to 1). The lengthened level (briefly, level) of a 
node in a mixed generating tree is defined as the sum of the lengths of the edges 
connecting the root to that node. 

Now consider a set of production rules (instead of a single one, as we do in a 
classical ECO system) and make it act on every label, each rule producing its sons 
at a different level What we get is precisely a mixed generating tree where a set of 
h (possibly different) production rules is involved: every node produces sets of sons 




Integer partitions with the ECO method 



27 



at h different successive levels, each set according to one of the defining succession 
rule. A simple example of a mixed succession rule is the following. Consider the 
two production rules: 

(fc)^(2)(3)-..(A:)(fc + l) , (/c)^(l)(2)---(fc-l)(fe) 

One can define the following mixed succession rule: 

( ( 1 ) 

\ (fc) ^ (2)(3)...(fc)(fc + l) . 

I - (l)(2 ).--(fc-l)(fc) 

The above formal system must be interpreted as follows: in its mixed gener- 
ating tree a node at level n produces 2 sets of sons, one set at level n + 2 (according 
to the first production rule) and the other at level n + 3 (according to the second 
production rule). From the point of view of the ECO method, it is not difficult 
to see that the above mixed succession rule arises from a natural construction of 
generalized Motzkin paths using an horizontal step of length 3. 

The theory of mixed succession rules has not yet been developed in its full 
generality, but some specific caises have been considered. Here we only mention the 
paper [FPPR] , in which a particular kind of mixed succession rules (called jumping 
succession rules) is systematically studied. 

Another generalization of the concept of succession rule we will need in the 
sequel is the notion of ECO system with multiple labels. More precisely, it is an 
ECO system in which the labels are not simply positive integers, but rather couples 
(more generally, h-tuples) of positive integers. Also this concept has occasionally 
surfaced in previous works (see, for instance, [GPP]), but the theory behind it 
remains largely undeveloped. 



3. A general succession rule for integer partitions 

In this section we will rediscover a classical construction of integer partitions giving 
it an ECO interpretation. 

For any /c G N, /c 7 ^ 0, consider the following mixed succession rule: 



(fc) 

(h) ^ 

2 



( 1 ) 

( 2 ) . 



( 3 ) 



V ^ (^) 

If we denote by Fk{x) the generating function of the numerical sequence 
associated with the rule we easily have: 



F,{x) 



1 

1 — x’ 



F2{x) = 



1 

(1 — x)(l — x^) 



For a generic fc, a look at the generating tree of the rule suggests the following 
equality: 



Fk{x) = 1-f xFi(x) -f x^F 2 (x) 4- . . . + x^Ffc(x), 




28 



Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 



whence, using a simple induction argument: 

k 



i=l 



If we let k tend to infinity, we obtain the following generating function: 

F(x)= lim ft(x)=nri^- 

k>l 



( 4 ) 

( 5 ) 



Finding an ECO system Q associated with the generating function F{x) is 
possible, provided that we allow the use of the symbol oc as a possible label. In 
fact, we get for the following expression: 



O: ( 



(oo) 

ih) 



1 

2 



( 1 ) 

( 2 ) 



( 6 ) 



^ {h) 



where the unique node labelled (oo) (which is the axiom of the system) produces, 
by convention, a node labelled (i) at level z, for every z > 1 (so that it has an 
infinite number of sons). The first levels of the generating tree of Q, are given in 
Fig. 2. 





Figure 2. The first 4 levels of the generating tree of Q. 

Using a terminology taken from [FPPR], we can informally say that fl = 
lim/c^oo ^k- The rule fi enumerates precisely integer partitions, since F{x) is the 
well-known generating function of them. Every integer partition A = (pi, . . . ,p/) 
has a simple graphical representation in terms of Ferrers diagrams made of I rows 
such that the z-th row is made of pi cells. 

There is an ECO construction of Ferrers diagrams of partitions encoded by 
Q; it can be described as follows: 

i) the empty object (Ferrers diagram without cells, corresponding to the void 
partition of 0) has label (oo) and produces a single row having h cells at 
each level h (corresponding to the partition (h) of /z); 




Integer partitions with the ECO method 



29 



ii) a Ferrers diagram representing a partition of n, whose bottom row is made 
of h cells, has label {h) and produces h Ferrers diagrams representing 
partitions of n + 1, . . . , n + /i, respectively, and having labels (1), . . . , (h), 
as it is shown in Fig. 3. 



(^) 







(1) 



I4j.h 




( 2 ) 







(S) 






Figure 3. The ECO construction for partitions. 

An obvious consequence of this construction is that, for any fc E N, the rule 
Qk enumerates partitions A = (pi, • • • ,Pz) where each part Pi is less than or equal 
to k. 



4. A generalization and some examples 

The previous rule for integer partitions can be suitably generalized. Suppose to 
consider only those partitions using a prescribed type of parts. More precisely, a 
sufficiently general setting consists of taking partitions each of whose parts depends 
on the previous one. So, given a partition A = (pi,...p/) h n, we require that 
Pi G U{pi-i) C N (i.e., the set U{pi-i) only depends on Pi-i). In particular, 
we define II(oo) to be the (possibly infinite) set of allowed “maximum parts” for 
the partitions we are taking into consideration. Therefore, a construction for the 
above described class of partitions can be obtained by suitably modifying the 
general ECO construction for partitions given above, so getting to the following 
succession rule: 



ik) 



In the above ECO system the set {(an)}n is the set of labels of the parts 
belonging to the set II(oo) = {in}n^ whereas, if (fc) is the label of a part p of a 
partition, then {(fci), . . . (fc/c)} is the set of labels of the parts belonging to the set 



•w 


(«i) 




i^n) 




3l 


ih) 


jk 

'W 


ih) 




30 



Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 



n(p) = {ji? • • • Some interesting examples fall into this framework and can 
be obtained by specializing the quite general succession rule in (7). 

4.1. Partitions into distinct parts and partitions into odd parts 

Let ® be the set of partitions into distinct parts and 0 the set of partitions into odd 
parts. The two classes are easily proved to be enumerated by the same generating 
function: 

E = 11(1 + x^) = Y[ = n = E (8) 

\£T> i>l i>l i>0 Ago 

The ECO construction previously proposed for integer partitions can be 
slightly modified in order to describe an ECO construction for the classes 0 and 
D. 

A construction for D.: Partitions of integers into distinct parts are repre- 
sented by Ferrers diagrams whose rows all have different lengths. The 
ECO construction is the following: 

i) the empty partition of 0 has label (oo) and produces a single row made 
of h cells at each level h (corresponding to the partition {h) of /i); so 
n(oc) =N* = N\{0}; 

ii) a Ferrers diagram representing a partition of n, whose bottom row is 
made of h-f 1 cells, is labelled (h) and produces h Ferrers diagrams rep- 
resenting partitions of n+1, . . . n+h and having labels (0), . . . , (h — 1), 
respectively, as it is shown in Fig. 4; therefore, II(/i-hl) = {1, 2, . . . , /i}. 




Figure 4. The ECO construction for partitions into distinct parts. 



The succession rule associated with this construction is the following: 

f (oo) 

(h) ^ (0) 

( 1 ) ( 9 ) 



Q® : I 



1 

-W 

2 



{h-i) 







Integer partitions with the ECO method 



31 



In the generating tree of a node labelled (0) produces no son. The 
generating function for can be obtained in an analogous way to that for 
the generating function of fl, getting again the series Fx>{x) = 
in (8). 

A construction for 0.: An ECO construction for the class of partitions into 
odd parts can be obtained by specializing the above described general 
setting to the case in which II(cxd) is the set of odd positive integers and, 
for every odd p, n(p) — {g I q odd, g < p} (see Fig. 5). 

(.b (/) 










( 2 ) 








^ 5 . 5 . 5 ; 



Figure 5. The ECO construction for partitions into odd parts. 



( 1 ) 

( 2 ) . 

(h) 

Also in this case it is easy to determine the generating function by 
an inductive argument, so obtaining Fo{x) = ri 2 >o ^ it is well- 

known. 

This last example is particularly nice, since II(oc) is the set of all 
possible parts appearing in the ECO system (odd positive integers) and 
n(p) is just the subset of II(oo) whose elements are less than or equal to 
p. More generally, if II(oo) = {an}n and Il(an) = {a^ | k < n}, the usual 
inductive argument leads to the generating function 

n 

This is the classical result concerning the enumeration of partitions 
whose parts belong to a fixed set. 



The associated succession rule is: 

(~) 



(h) 






1 

3 

-w 



2 / 1-1 






32 



Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 



5. An alternative ECO system for integer partitions 

The ECO approach suggests an alternative construction for integer partitions. 
Starting from a given partition A = (pi, . . . ,p/) of n, we define two new partitions 
of n + 1 and n +pi , respectively, which are precisely Ai = {pi + l,p 2 , . . . ,pi) h n + 1 
and A 2 = {PuPi,P 2 , • • ’ ,Pi) F n+pi. Fig. 5 graphically describes this construction 
on Ferrers diagrams. 



H) (5) 




( 4 . 2 . 1 1 



(4) 




(4.4.2. n 



Figure 6 . An alternative ECO construction for integer partitions. 



rule: 



Such a construction is immediately seen to be associated with the succession 



: < 



( 1 ) 

{h) 



1 

h 



(h + l) . 

(h) 



(10) 



Rule in (10) does not satisfy the consistency principle typical of ECO 
systems, i.e. labels do not denote the number of sons; in particular, each node in 
the generating tree of produces exactly two sons. According to the construction 
suggested by Fig. (5), each label (h) corresponds to a partition whose maximum 
part is h. 



6. Some applications and open problems 

6.1. Lecture Hall partitions 

The theory of Lecture Hall partitions has been initiated in [BMEl], which is the 
basic article we refer the reader to concerning this topic. 

Only in this section, we change our notation for partitions: if A = (pi, . . . ,p/) 
then we assume that Pi < P 2 < • * • < P/- For A; > 1, let L/c be the following set of 
partitions (having possibly some empty parts): 

= |(Pl,---,Pfc) I 0< Y < y < ••• < y|. 

The elements of L/c are called Lecture Hall partitions of length k. We also 
denote by the set of all partitions of with empty parts removed. For example, 
the partition (2, 3) of 5 belongs to D 3 but not to © 2 . It is clear that, for any given 
A, there exists a minimum k such that A G D/c- this will be called the minimum 
length of A as a Lecture Hall partition. The concept of Lecture Hall partition allows 
to give a finite version of the well-known result (due to Euler) that the number of 




Integer partitions with the ECO method 



33 



partitions of n into odd parts equals the number of partitions of n into distinct 
parts. More precisely, in [BMEl] it is shown that the number of partitions of n 
into odd parts less than or equal to 2/c — 1 is equal to the number of Lecture Hall 
partitions of length k of n. 

The first bijective proof of (a refined version of) this result appears in [BME2]; 
however, the authors themselves admit that such a proof finds its origin in the 
algebraic context of Coxeter groups. Some bijective proofs have been recently 
given in [E, Y]. The present approach to integer partitions suggests a possible 
way to find a new natural bijection proving the Lecture Hall theorem in a purely 
combinatorial way. The idea is to give two distinct combinatorial interpretations 
to the generating tree associated with a given ECO construction. 

A possible ECO construction for 0 can be obtained by suitably modifying 
the one given in section 5 for unrestricted partitions. It is not difficult to see that 
the associated succession rule is the following: 

f 

(h) ^ (h + 2) . (11) 

[ - w 

It is clear that the set of labels of this ECO system is the set of odd positive 
integers. The first levels of its generating tree are the following: 



( 1 ) 




Figure 7. The first levels of the generating tree of the rule 



We conjecture that the ECO system in (11) provides a construction also 
for partitions into distinct parts. This would lead to a presumably new bijection 
between 0 and D from which it would be immediate to deduce an explicit bijection 
proving the Lecture Hall theorem. Indeed, in the conjectured interpretation of the 
above generating tree, it seems natural to think that the label of a partition is 
strictly related to its (minimum) length as a Lecture Hall partition: more precisely, 
a node labelled 2fc — 1 represents a Lecture Hall partition of minimum length k. 

Concerning this problem, we cite the remarkable paper [P] , where the author 
introduces a truly nice setting to deal with bijective questions on partitions which 
could be useful in this context. 




34 



Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 



6.2. Generalized Hook partitions 

In the paper [BR], Berele and Regev show how the representation theory of Lie 
superalgebras heavily relies upon the knowledge of the combinatorics of partitions 
fitting inside of a hook shaped figure (we will briefly call them generalized hook 
partitions). The above article has been followed by many others, such as [Rl, 
R2], and the study of this kind of partitions is still object of investigations in 
combinatorics and algebra. 




( 7 , 5 , 3 , 2 , 11 ) 



2 



Figure 8. A hook partition of shape (2,3). 



Let yih.k be the set of generalized hook partitions of shape {h^k), that is, by 
definition, the set of all partitions that fit inside a hook shape of k rows and h 
columns. Our aim is to restrict the general ECO construction for partitions to the 
set This will lead us to the determination of the generating function of yCh,k- 

We consider two disjoint subsets of ^h,k- 

i) Partitions having j < k parts; a partition in this set has label (l,j), where 
I is the number of cells in the last row of its Ferrers diagram. The ECO 
construction applied to such a partition works exactly like in the general 
case, leading to the following production: 

(IJ) (i.i + i) 

< ... . ( 12 ) 
- + 

Observe that (12) is a production with double labels, since we have 
to take into account also the number of parts. However, if j = fc — 1, we 
choose to delete the second label, for reasons that will be explained below. 

ii) All the remaining partitions. In this case, because of the hook shape con- 
straint, the ECO construction is made by adding to a Ferrers diagram 
only rows having at most h cells. Since the last production described in i) 
produces simple labels, here we can avoid the use of multiple labels. The 
associated production is then the following: 



(0 



1 



( 1 ) 

(m). 



( 13 ) 



where m — min (/, h). 




Integer partitions with the ECO method 



35 



Let be the generating function for the class For any label (/, j), 

with j < k, denoting by the generating function of the ECO system 

having (/, j) as axiom and the above ones as production rules, we can deduce the 
following recursion: 

= 1 + + ... + (14) 

Analogously, if j > fc we have: 

= 1 + + . . . + x^hI^^(x). (15) 

Starting from this general setting we are able to compute the generating 
function of generalized hook partitions of any fixed shape. 

For instance if we consider hook partitions of shape (2,3), we get to the 
following succession rule: 

' (oo,0) 

(U) 

(Z,2) 

< 

( 1 ) 

(0 

leading to the generating function: 

H2M^) = H3,2{x) 

1 

(1 - o:)(l — x^)(l — a;®) y ' 1-x ' (1 - x)(l - x^j J ' 

More generally, we have; 



( 1 , 2 ) 



i 



1 



I 



1 



1 

2 



( 1 ) 

(0 

( 1 ) 

( 1 ) 

( 2 ) 



(16) 






I / ^h+1 ^k{h+l) 

A A 1 - \ 1 - X (1 - x) . . , (1 - X^) 



(17) 



Observe that the equality Hh,k{^) = Hk,h{^)^ which is immediate by a com- 
binatorial point of view, is by no means obvious from an algebraic one. 

We point out here that the problem of determining the generating functions 
Hh,k{^) was previously considered in [OZ], where the authors find an explicit 
formula for them. However, their expression involves rather complicated quantities, 
whereas formula (17) is quite easy to read. It would be instead interesting to extend 
this result to the case of partitions fitting inside of the intersection of two hooks 
of different shapes. Also this problem arises in the study of representation of Lie 




36 



Luca Ferrari, Renzo Pinzani, and Simone Rinaldi 



superalgebras, in connection with the module decomposition of supersymmetric 
power of matrices [S], and it is still open. 

References 

[BDLPP] E. Barcucci, A. Del Lungo, E. Pergola, R. Pinzani, ECO: A methodology for the 
enumeration of combinatorial objects^ J. Differ. Equations Appl. 5 (1999) 435-490. 
[BR] A. Berele, A. Regev, Hook Young diagrams with applications to combinatorics and 
to representations of Lie superalgebras, Adv. in Math. 64 (1987) 118-175. 

[BMEl] M. Bousquet-Melou, K. Eriksson, Lecture Hall partitions, Ramanujan J. 1 (1997) 
101 - 111 . 

[BME2] M. Bousquet-Melou, K. Eriksson, A refinement of the Lecture Hall theorem, J. 

Combin. Theory Ser. A 86 (1999) 63-84. 

[DFR] E. Deutsch, L. Ferrari, S. Rinaldi, Production matrices, (submitted). 

[E] N. Eriksen, A simple bijection between Lecture Hall partitions and partitions into 
odd integers, proceedings of FPSAC 2002, Melbourne. 

[FPPR] L. Ferrari, E. Pergola, R. Pinzani, S. Rinaldi, Jumping succession rules and their 
generating functions, Discrete Math. 271 (2003) 29-50. 

[FP] L. Ferrari, R. Pinzani, A linear operator approach to succession rules. Linear Algebra 
Appl. 348 (2002) 231-246. 

[GPP] O. Guibert, E. Pergola, R. Pinzani, Vexillary involutions are enumerated by 
Motzkin numbers, Ann. Gomb. 5 (2001) 153-174. 

[OZ] R. C. Orellana, M. Zabrocki, Some remarks on the characters of the general Lie 
superalgebra, arXiv:math.CO/0008152vl, 2000. 

[P] I. Pak, Partition identities and geometric bijections, Proc. Amer. Math. Soc., to 
appear. 

[Rl] J. B. Remmel, The combinatorics of (k,l)-hook Schur functions. Combinatorics and 
Algebra (Boulder, Colorado, 1983), 253-287, Contemp. Math., 34, Amer. Math. Soc., 
Providence, RI, 1984. 

[R2] J. B. Remmel, A bijective proof of a factorization theorem for {k,l)-hook Schur 
functions. Linear and Multilinear Algebra 28 (1990) 119-154. 

[S] T. Seeman private communication, 2003. 

[Y] A.J.Yee On the refined lecture hall theorem. Discrete Math. 248 (2002) 293-298. 

Luca Ferrari 

Dipartimento di Scienze Matematiche ed Informatiche, Pian dei Mantellini, 44, 

53100, Siena, Italy 

ferrari@math.unifi.it 

Simone Rinaldi 

Dipartimento di Scienze Matematiche ed Informatiche, Pian dei Mantellini, 44, 

53100, Siena, Italy 

rinaldi@unisi.it 

Renzo Pinzani 

Dipartimento di Sistemi e Informatica, via Lombroso 6/17, 50135 Firenze, Italy 
pinzani@dsi.unifi.it 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



321 - A voiding Permutations and Chebyshev 
Polynomials 

Toufik Mansour 



ABSTRACT: In [6] it was shown that the generating function for the number 
of permutations on n letters avoiding both 321 and (d + l)(d + 2) . . . kl2 . ..d is 

given by for all k > 2, 2 < d + 1 < k, where Um is the mth Chebyshev 

polynomial of the second kind and t = . In this paper we present three different 

classes of 321-avoiding permutations which are enumerated by this generating 
function. 



Let a £ Sn and r G be two permutations. Then a contains r if there 
exists a subsequence 1 < ii < Z 2 < • • • < i/c < ^ such that , . . . , is order- 
isomorphic to r; in such a context r is usually called a pattern'^ a avoids r, or 
is T-avoiding^ if a does not contain such a subsequence. The set of all r-avoiding 
permutations in Sn is denoted by 5n(r). For a collection of patterns T, a avoids 
T if o avoids all r eT; the corresponding subset of Sn is denoted by Sn{T). 

While the case of permutations avoiding a single pattern has attracted much 
attention, the case of multiple pattern avoidance remains less investigated. In 
particular, it is natural to consider permutations avoiding pairs of patterns ri, 
T2 . This problem was solved completely for ti , T2 G Ss (see [8] ) , for t± G Ss and 
T2 G S4 (see [10]). Several recent papers [1, 2, 4, 5, 6, 7] deal with the case ti G S3, 
't ~2 ^ Sk for various pairs ri,T 2 , e.g. in [1] it was found by using transfer matrices 
that the generating function for the number of permutations in 5^(321, [k,k]) is 
given by 



Rk{x) = 



2tUk-i{t) 

Ukit) 



1 




where C/m(cos0) = sin(m + 1)0/ sin 0 is the mth Chebyshev polynomial of the 
second kind and [d, k] = d{d + 1) . . . fcl2 . . . (d — 1). Later, in [6] Mansour and 
Vainshtein proved a natural generalization for this theorem. 



Theorem 1. For any k > 2 and 2 < d -h 1 < A:, the generating function for the 
number of permutations in *S'n(321, [d + 1, fcj) is given by Rk{x). 



Recently, Mansour and Stankova [3] presented an exact enumeration for the 
case 32 1-fc-gon- avoiding permutations in Sn which generalizes the methods in [9] 
and [6]. In particular they proved the following result: 

Theorem 2. For any k > 4 and 2 < d < fc - 2, the generating function for the 
number of permutations in 5n(321, (d + l)(d 2) • • • (fc — l)lfc23 • • • d) is given by 
Rk{x). 




38 



Toufik Mansour 



Let us define there patterns: 

otd,k — d{d + 2)(d + 3) . . . fcl2 . . . [d — l){d + 1), 

Pd,k = d{d + 2){d + 3) ... (A: - l)lfc23 . . . (d - l){d + 1), 
lfd,k — d{d + 2)(d + 4) . . . fcl2 . . .{d — l)(d + l)(c! + 3). 

The main theorem of the paper is formulated as follows. 

Theorem 3. 

( i) Let A; > 4 and 2 < d < k — 2. Then the generating function for the number 
of permutations which avoid both 321 and ad^k is given by Rk{x). 

(ii) Let k > 6 and 3 < d < k — 3. Then the generating function for the 
number of permutations which avoid both 321 and ^d,k is given by Rk{x). 

(iii) Let fc > 6 and 2 < d < k — A. Then the generating function for the 
number of permutations which avoid both 321 and ^d,k is given by Rk{x). 

Our proof of the Theorem 3 is based on finding a recursion for the numbers 
in question by purely analytical means. In particular, we generalize the methods 
and extend the results in [1, 3, 6]. 

In spite of the paradigm formulated in [2], that any enumeration problem 
leading to Chebyshev polynomials is related to Dyck paths, it would be tempting 
to find a proof that exploits such a relation. 

References 

[1] T. Chow and J. West, Forbidden subsequences and Chebyshev polynomials, Discr. 
Math. 204 (1999) 119-128. 

[2] C. Krat tent haler. Permutations with restricted patterns and Dyck paths, Adv. in 
Applied Math. 27 (2001) 510-530. 

[3] T. Mansour and Z. Stankova, 321-polygon-avoiding permutations and Chebyshev 
polynomials, Elect. J. Comhin. 9:2 (2003) #R5. 

[4] T. Mansour and A. Vainshtein, Restricted permutations, continued fractions, and 
Chebyshev polynomials. Elect. J. Comhin. 7 (2000) #R17. 

[5] T. Mansour and A. Vainshtein, Restricted 132-avoiding permutations, Adv. Appl. 
Math. 126 (2001) 258-269. 

[6] T. Mansour and A. Vainshtein, Layered restrictions and Chebyshev polynomials, 
Ann. of Comhin. 5 (2001) 451-458. 

[7] T. Mansour and A. Vainshtein, Restricted permutations and Chebyshev polynomials, 
Sem. Lothar. de Comhin. 47 (2002) Article B47c. 

[8] R. Simion, F.W. Schmidt, Restricted Permutations, Europ. J. Comhin. 6 (1985) 
383-406. 

[9] Z. Stankova and J. West, Explicit enumeration of 32 1-hexagon- avoiding permuta- 
tions, Disc. Math., to appear. 

[10] J. West, Generating trees and forbidden subsequences, Discr. Math. 157 (1996) 363- 
372. 

Toufik Mansour 

Department of Mathematics, University of Haifa, 31905 Haifa, Israel 
toufik@math.haifa.ac.il 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Iterated Logarithm Laws and the Cycle Lengths 
of a Random Permutation 

Eugenijus Manstavicius 



ABSTRACT: We are concerned with the iterated logarithm laws for map- 
pings defined on the symmetric group. For the sequences of the cycle lengths and 
the different cycle lengths appearing in the decomposition of a random permuta- 
tion, such laws provide asymptotical formulas valid uniformly in a wide region for 
the sequence parameter. The main results are analogues to Feller’s and Strassen’s 
theorems proved for partial sums of independent random variables. 



1. Weak convergence of distributions 



Let cr G Sn be an arbitrary permutation and 

a = K,i’"Kw, w = w{a), (1) 

be its unique up to the order expression by the product of the independent cycles 
HI. Denote by i/n(- • • ) ~ ^ : • • • }| the uniform probability measure on 

Sn- In what follows we assume that n oo. Despite to the long list of asymptotic 
results (see [1], [2], [7-10], [12-15], and other publications), we examine the ordered 
statistics 

1 < Ji{cr) < • - < J^a{cr) < n 

and 

1 < ji(cr) < • • • < <n, s = s{a), 

consisting of the cycle lengths and the different cycle lengths appearing in decom- 
position (1). In this remark, from these two very close cases we always choose that 
requiring less technical details. 

Let us remind a few of classical results. In 1942 Goncharov [8] (see also [9]) 
found the limiting distribution, say, Fk{x), of J^-k{(^)/n for arbitrary fixed k >0. 
In particular, for the longest cycle, we have 



{Jw{cr) < xn) Fq{x) 

(-1)' 



1+ E 

l<Kl/x 



_1)( p .0 

Jl 'k 



\{u\ ' F Ui ^ 1 /^} 



dui 

Ui 



dui 

ui 



( 2 ) 



if 0 < X < 1 and Fo(a;) = 1 if X > 1. Here and in what follows !{•} denotes the 
indicator function. The moments of Fk{x) were calculated in [13]. As it is stated 
in [9] (see also [10]), 

i/n (log Jm < X\[m -h m) -> $(x) 

provided that m = alogn o{^\ogn) and 0 < a < 1. Here and in what follows 
$(x) denotes the distribution function of the standard normal law. 




40 



Eugenijus Manstavicius 



In 1972 A.M. Vershik and A. A. Schmidt [14] announced (all details were 
furnished in [15]) several results on the asymptotic distribution of the random 
element 

(Jy; (cr)/n, Ti^ ... J\ (cr)/ n, 0, . . . ) 

under the measure in the simplex 

r := {(xi,X2, . . . ) * 1 > > X2 > • • • > 0, + X2 H = 1}. 

Applying continuous mappings of F to other spaces, they derived some fairly in- 
teresting corollaries. In particular, they proved that 

lim lim \ ^og{Jyj-i{(T)/n) < x\pi - i ) — » ^(x). 

z — >•00 n— »-oo \ / 

Our questions are simple: 

What asymptotic results on valid uniformly in m E 

[on^bn], where [un^&n] “Is a fortiori given subinterval of [l,n]? What type of con- 
vergence can give such information! 

First, we indicate an idea that the functional limit theorems can give some 
approach to the problem. Let kj{a), 1 < j < n, denote the number of cycles 
of length j in (1). By the functional limit theorem for additive functions on the 
symmetric group [4] or [3] , the processes 

(w(cT;n*)- nog n), w{a-,y) :='^kj{a), te[0,l], 

iogn 

j<y 

weakly converge to the standard Brownian motion W{t). Taking the maximum 
functional, we obtain 



Uji ( max \Wn((J’,t)\ < X I P { max \W(t)\ < a: ) 

\t€[0,l] J \t€[0,l] / 



1 J — : 



{u — 2lx)‘^ 
2 



du =: V{x). 



The maximum under consideration can be attained at the points t — logJmicr) 
/ logn, 1 < m < u; or at t = 1. Applying this together with ic(cr; Jm(p')) — we 
derive 

Uji ( max |log J^(cr) — m\ < x\/logn, \w{a) - \ogn\ < xy/logn ) —V{x)-\- o{l). 

\rn<w J 

( 3 ) 

Further, observing that by (2) 

Vn (^1 log - logn| > e^\og = 1 - Fo(exp{e\/logn}) 

-f-Fo(exp{-ev^logn}) + o(l) = o(l) 

for each e > 0, we can drop the second event in the frequency of equality (3). So, 
we obtain the following result. 

Theorem 1.1. We have 

Vn ( max \log Jm{o-) -m\ < x^/logn ) = F(x) + o(l). (4) 

\^m<w J 

Most probably, the last assertion could also be derived from the aforemen- 
tioned results proved in [14] and [15]. 




Iterated logarithm laws 



41 



2. Strong convergence 

The next result related to our question involves stronger convergence than the weak 
convergence of distributions. Actually, we examine an analog of the convergence 
with probability one. To give a sense for this notion on a sequence of probability 
spaces, we have to consider convergence in distribution of the ’’tails” of sequences 
of random variables. In [12] we proved the following result. 



Theorem 2.1. We have 



lim lim i/n max 7^; — ; — ; > 1 + 0 | =0 



and 



m -^00 n^oo V m <m<s (2m log log m) 

logjm(cr)-m 



lim Im i/n I min 

\ni<rri<s 



(2m log log m)^/^ 



<S] =1 



for each b E [—1,1] cind J > 0. 



(5) 

( 6 ) 



In connection to (4), (5), and (6), it is worth to recall that the mean values 
of 1 C = w((j) and s — s(a) are asymptotically equivalent to logn. The estimate of 
|logjm(<^) — ^1 following from (5) still has the error (5(2mloglog m)^/^. In this 
direction we now have the following improvement. 

Denote Lu = logmaxju, e} = Ti-u, . . . , L^u — L(Lk-iu) for -u G R. For (5 > 0 
and k >2, set 

l3mk{^ ±S) = ( 2 m (L2m 4- ^Lsm 4- I/4m 4 h (1 ± S)Lkm 




Theorem 2.2. For arbitrary 0 < J < 1 and k >2, we have 

\^0g jm((T) -m 



lim lim I max 

ni-^oon-^00 \m<m<s / 3 km{^ 0 ) 



> 1 



and 



V r ( |logjm((j) -ml 

hm Inn I max — - — — — > 1 

nx^oo n^oo \nx<m<s Pfc^(l — d) 



1 . 



Thus, answering our question we may say that ’’for almost all cr G S^” 

I ^og jm(cr) - m| < ^fcm(l + S) 

uniformly in m, ni < m < s(a), where ni 00 arbitrarily slowly. This assertion 
is sharp in the sense that we can not change S by —S. 

The numbers jm{o') are just the success epochs of the following partial sum 
function 

:= > i}> ^<y<n. 

j<y 

By virtue of but using a bit complicated calculations, one can 

show that Theorem 2.2 is equivalent to the next result. For <5 > 0 and A: > 3, set 




42 



Eugenijus Manstavicius 



Theorem 2.3. Let 0 < 5 < 1 and k >3 be arbitrary. We have 

|s(cr;m) — logml 



lim lim := lim lim max 

ni— >oon^oo ^ ni— >oon— »oo yni<m<n O^/cmV-L + ^j 



> 1=0 



ancf 



lim lim i' := lim lim 

ni ^ rii *’CX) ^ — >.qq yni<m<n 

A proof will be given in the last section. 



s{a;m) -logm\ 
max 7 - > 1 



= 1 . 



3. Strassen’s law of iterated logarithm 



Adopting our experience obtained in probabilistic number theory [11], we can pro- 
pose further generalization of the results of Section 2. We now deal with sequences 
of functions defined on the symmetric group. Set 



s{a-,m*) - tlogm 

’ V2LmL3m 



t G [0, 1], 1 < m < n. 



The trajectories of these random processes lay in the space D = D[0, 1] of the right 
continuous functions on [0, 1] having left-hand limits at each point. We assume that 
D is endowed with the Skorokhod metrics. We also introduce the linearized version 
Gm(o';t) of Sm(o';t). For ti := (log/)/logm and t G we set 



U-\-l - H 

where 1 < Z < m — 1. Our problem now is to describe the set of limit curves of 
{Sm{o‘i ^)} and {G^(cr; t)} ’’for almost all a G S^” in the sense of Section 2. Again, 
to overcome some obstacles appearing in the sequence of probability spaces, we 
have to introduce more general notions. 

Let (U, cZ) be a separable metric space and Y, Fi , . . . , be U- valued random 
elements all defined on the probability space {On,Tn,Pn}? = {^n}, where n — 

1,2, Let, as usual, d{X,A) = inf {d(X, Z) : Z G A}, where A c U. We say 

that Ym converges to Y ^-almost surely (T — a.s.) if for each e > 0 



lim lim 

ni-^oo n-^oo 



( max d{Ym,Y) 

V ni <m<n 




= 0 



Thus, a compact set A C U such that, for each £ > 0 and each A G A, 



lim lim Pn ( max d(l^. A) > £ ) = 0 (7) 

ni— >oon— >•00 yni<m<n J 

and 

lim lim Pn ( min d{Ym^X) < £ J = 1 (8) 

\ni<rn<n J 

may be called a cluster set of the sequence {Ym] ^ — a.s.. In what follows we 
denote the relations (7) and (8) by 

Ym-^A {y-a.s.). (9) 

Denoting (8) by Ym> X (? — a.s.) we have in mind that m' can be a random 
increasing subsequence. 




Iterated logarithm laws 



43 



Let C = C[0, 1] be the space of continuous functions on the interval [0,1] 
endowed with the supremum distance We recall that the Strassen set 3C 

agrees with the set of absolutely continuous functions g such that p(0) = 0 and 

/ {g'{i)fdt<l. 

These definitions are applicable to {Gm(^r)} where 1 < m < n. 

For the family of probabilities, we now take z/ := 

Theorem 3.1. We have 

^ “ ^- 5 .) ( 10 ) 

in the space C and 

Sm{cr]-)^X (ly-a.s.) ( 11 ) 

in the space D. 

The idea of the proof is similar to that of Theorem 2.3. The details will 
be exposed in a forthcoming paper. Applying the same argument as in the last 
section, we can verify that 



Un ( max sup \Sm{cr;t) - Gm{cr]t)\ > ^ = o(l) 

for every £ > 0 as n ^ oo and ni oo. So, (10) and (11) are equivalent. The 
following lemma is very useful in various applications of the last result. 

Lemma 3.2. Let (U, d) and (Ui, di) be separable metric spaces and let f : \J ^ Ui 
be a continuous map into Ui. If A is compact subset ofJJ and Ym ^ A (? — a.s.) 
in (U, d), then f{Ym) ^ f{A) (CP — a.s.) in the second space (Ui,di). 

By virtue of this lemma, Theorem 3.1 implies Theorem 2.3 which assertion 
can now be rewritten as 



G™(a;l) ^ [-1,1] (v-a.s.). 



Going along this path, we can list more consequences. 



Corollary 3.3. The following relations hold (z/ — a.s,): 

• (Gm(CT;l/2),Gm((T;l)) ^ L := {(u,u) ; + {v-uf < 1/2} ; 

. G„,((t; 1/2) -^[-v/2/2,V2/2]; 

• if m' is the subsequence for which Gm'(o’;l/2) ^ ■\/2/2, then we have 

gi, where 

*/ 0 ^ ^ < 1/2, 

1V2/2, if l/2<t<l; 

• ifm' is the subsequence for which Gm'(cr; 1/2) 1/2 and 1) ^ 0, 

then we have Gm'{c^] *) — ^ 92, where 



92 




1-t, 



if 



0<t< 1/2, 

1/2 < t < 1 . 



These sophisticated examples of continuous functionals / can be found in 
[6]. Using the fact that the cycle lengths jm(o') are the counts of the partial sums 
5(cr;m), as in Section 2, we could convert the relations of this Corollary to that 
for jm(^)- Nevertheless, we now prefer the functional form of doing that. 




44 



Eugenijus Manstavicius 



First, we introduce a new sequence of processes. Set j{a]u) = 1 if 0 < u < 1 
and j(cr; u) = j[u]{o-) if 1 < u < s = s{a). Define 






logj((7; tm) — tm 
yj2m log logm ’ 



0<t<l, 3 < m < s = s{a). 



Theorem 3.4. We have 
in the space D. 



To derive this claim from Theorem 3.1, we can use the generalized inverses. 
Let Do denote the subspace of D consisting of nonnegative nondecreasing func- 
tions. For X G Do, we define X~^ G Do by 

X~^{t) = inf {u G [0, 1] : X{u) > t} 

with the agreement X~^{t) = 1 for X(l) <t < 1. A useful auxiliary result for the 
proof has been provided by W. Vervaat [16]. 



Lemma 3.5. Let Xm G Do and Sm be a sequence of positive numbers, Sm 
g = g(t) G C, then as m ^ oo the following relations are equivalent 



sup 



Xm{t)-t 



■9{t) 




0 



and 



sup 






+ 9{t) 



i e [0, 1] ^ ^ 0. 



0. If 



For the proof of Theorem 3.4, it suffices to apply (11) and Lemma 3.5 with 
g £ %, 

Xm = Xm{o, t) = S(<T, m*)/ log TTl, 

and 5m = ( 2 L 3 m)/logm)^'^^. In the style of Corollary of Theorem 3.1, we have, 
for instance, 

^„(a;l/2) ^ [-\/2/2,V2/2], 1) - «-(a; 1/2) [-^^2/2,V2/2] {u-a.s). 

To check the second relation here, we can use the two-dimensional convergence 
pointed out in the first item of Corollary 3.3. The last two relations show some 
fascinating symmetry between the behaviour of logj»(cr; tm) and log (j(cr; m)/j(a; 
{1 — t)m)) at the point t = 1/2. We have some other observations of a similar 
symmetry. In what general forms can such phenomenon appear? 



4. Proof of Theorem 2.3 

The main probabilistic ingredient is found in the classical paper of W. Feller [5]. 
Let Xn, n > 1 be independent random variables (r.vs), EX^ = 0, EX^ < oo, 
and Bl := EX? + • • • + EX? ^ oo. Here as previously n ^ oo. Denote Yn = 
Xi + ...+X,. 

Lemma 4.1. Let C > 0 be a constant and be a positive increasing sequence. In 
addition, assume that the r.vs Xn, n>l, satisfy 

|X„| < CBl/Xl 




Iterated logarithm laws 



45 



with probability one. If the series 



E 



xl /2 



converges, then 

P {Yn > XnBn infinitely often) = 0. 
If the series (12) diverges, then 

P (Yn > XnBn infinitely often) = 1. 



(12) 



We will need just a corollary for independent Bernoulli r.vs ^ > 1, such 
that P(^n = 1) = Pn — 1 — P(^n = 0). We reformulate it for the sum (n := 
^ 1-1 h ^n in a slightly modified form. 



Corollary 4.2. Let the Bernoulli r.vs satisfy the condition Pn = 1/n + 0(l/n^“^^), 
where e> 0 is arbitrary. Then, for every 0 < 5 < 1 and k >3, we have 



and 



lim lim P^ := lim lim P 

ni— >oon-^oo ^ rii—*oon—^oo 



( max 

Vni <m<n 



ICm -logm 
“I” 




-0 



lim lim 

ni— ^oo n— >oo 



:= lim lim 

rii — ^oo n— )-cxD 



max 

rii <m<n 



ICm -logm| 
7/em(l - S) 



> 1 = 1 . 



Our approach originated in probabilistic number theory (see [11]) has two 
steps. The first one is based upon the following lemma. 



Lemma 4.3 (Fundamental Lemma). There exist a probability space P} and 

independent Poisson r.vs Zj, EZj = 1/j, j >1, such that 

\un{{h{(T),^^^,kr{a)) e A) - P {{Zi, . . . , Zr) G A)\ < Cr/n 

with an absolute constant C > 0 uniformly in A C for each 1 <r <n. 



The assertion of this lemma follows from the Feller coupling. Much more 
precise estimate of this total variation distance is proved in [2]. It is also shown 
that the distance does not vanish when the condition r = o(n) is not satisfied. 
So, as in the probabilistic number theory (see [11]) Fundamental lemma allows to 
deal with ’’truncated” up to r additive functions. The remainder appearing in this 
procedure can be estimated by the following inequality obtained in author’s paper 
[ 12 ], 

Lemma 4.4. Let {hj{k)}, k > 0 and j > 1, be a two-dimensional array of real 
numbers such that hj{0) = 0. If Zj, j > 1, are the Poisson r.vs as in Lemma J^.3, 
then for arbitrary x > 0, 6rn ^ 1 < r < m < n, we have 



Unl max 

r<j<n 



^ ^ b-n 

j<m 



> X \ < 32e^P I max 

r<j<n 



j<m 



> x/3 . 



In particular, for arbitrary dr >-•> dn > 0 and aj G R, j > 1, 



Un I max dn 

r<m<n 






j<m 



j<m 



> x\< 



288e^ / ,9 

j<r r<j<n 




46 



Eugenijus Manstavicius 



Proof of Theorem 2.3. We start from the observation which actually explains 
why the results on the sequences m)} and {^(a; m)} are so close. This further 
leads to the similarity of estimates valid for { and If £ > 0 and a 

positive sequence oo are arbitrary, then by Lemma 4.4 we have 

( i^^x \s{a] m) — w{a] m)\ > 6 

\ni <m<n 



< 32e^P ( max ^ 

rii 



Y, {Zj - i{Zj > 1}) 



j<m 



> e/3 



< 32e^P Y \Z ‘3 - HZj > 1}| > e?/’(ni)/3 = o(l) 

\i<n j 

as n ^ OO and rii oo. 

Take r = r{n) = n/Ln > rii and some arbitrary e > 0. By virtue of the last 
estimate and Lemma 4.4, we obtain 

Vn ( max 7fcm(l \s{cr-,m) - logm| > 1 ) 

\ r<m<n J 

<Vn{ max 7fcm(l + - log’ll > 1/2 )+ ^'rn(l/2) 

\r<m<n J 

Hence, with such a choice of r we have 






< 



s(a;m) - log ml . 
max ■ ^ > 1 + o(l)- 



^ \ni<m<r 7fcm(l + <^) 

By the fundamental Lemma this may be rewritten as 



<n < P 



max 7 fcm(l + (5) 

rii <m<r 



-1 



j<m 



m 



>1 + 0 ( 1 ) 



For the Bernoulli r.vs := l{Zj > 1}, j > 1, we may apply Corollary 4.2. It 
yields the first assertion of Theorem 2.3. 

To prove the second one, it suffices to observe that 

/ \s(a:m) — logml \ 

> I'n max . .. - q = + o{l) 

and again to use the corollary with = ^j. Theorem 2.3 is proved. 



References 

[1] R. Arratia and S. Tavare, (1992) Limit theorems for combinatorial structures via 
discrete process approximations, Random Structures and Algorithms 3, 321-345. 

[2] R. Arratia, A. Barbour, and S. Tavare, Logarithmic Combinatorial Structures: a 
Probabilistic Approach, EMS Monographs in Math., The EMS Publishing House, 
Zurich, 2003. 




Iterated logarithm laws 



47 



[3] G.J. Babu and E. Manstavicius, (1999) Brownian motion for random permutations, 
Sankhyd A: The Indian J. of Statistics 61, No. 3, 312-327. 

[4] J.M. DeLaurentis and B.G. Pittel, Random permutations and the Brownian motion, 
Pacific J. Math. 119 (1985), No. 2, 287-301. 

[5] W. Feller, The general form of the so-called law of the iterated logarithm, Trans. 
AMS 54 (1943), 373-402. 

[6] S. D. Freedman, Brownian Motion and Diffusion, Holden-day, San Francisco, 1971. 

[7] V.L. Goncharov, On the distribution of cycles in permutations, Dokl. Acad. Nauk 
SSSR 35 (1942), 9, 299-301 (Russian). 

[8] V.L. Goncharov, Some facts from combinatorics, Izv. Akad. Nauk SSSR, Ser. Mat. 8 
(1944), 3-48 (Russian); On the field of combinatory analysis, Transl. AMS 19 (1962), 
1-46. 

[9] V.F. Kolchin, A problem of the allocation of particles in cells and cycles of random 
permutations, Teor. Veroyatnost. i Primenen. 16 (1971), No. 1, 67-82 (Russian). 

[10] V.F. Kolchin, Random Mappings, Optimization Software, Inc. New York, 1986. 

[11] E. Manstavicius, Functional approach in the divisor distribution problems, Acta 
Math. Hungarica 66 (1995), No. 3, 343-359. 

[12] E. Manstavicius, The law of iterated logarithm for random permutations, Lith. Math. 
J. 38 (1998), No. 2, 160-171. 

[13] L.A. Shepp and S.P. Lloyd, Ordered cycle lengths in a random permutation, Trans. 
AMS 121 (1966), 340-357. 

[14] A.M. Vershik and A. A. Schmidt, Symmetric groups of high order, Dokl. Acad. Nauk 
SSSR 206 (1972), No. 2, 269-272 (Russian). 

[15] A.M. Vershik and A. A. Schmidt, Limit measures that arise in the asymptotic theory 
of symmetric groups I, Teor. Veroyatnost. i Primenen. 22 (1977), No. 1, 72-88; II, 
ibid. 23 (1978), No. 1, 42-54 (Russian). 

[16] W. Vervaat, Success epochs in Bernoulli trials with applications in number theory. 
Math. Centrum, Amsterdam, 1972. 



Eugenijns Manstavicius 

Vilnius University, Department of Mathematics and Informatics 

eugenij us . manst avicius@maf . vu . It 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Transcendence of Generating Functions of 
Walks on the Slit Plane 

Martin Rubey 



ABSTRACT: Consider a single walker on the slit plane, that is, the square 
grid 1? without its negative x-axis, who starts at the origin and takes his steps from 
a given set 6. Mireille Bousquet-Melou conjectured that - excluding pathological 
cases - the generating function counting the number of possible walks is algebraic 
if and only if the walker cannot cross the negative x-axis without touching it. In 
this paper we prove a special case of her conjecture. 



1. Introduction 



Let 6 - the set of steps - be a finite subset of Z^. A walk on the slit plane is a 
sequence (0, 0) = wq.wi, . . . ,Wn of points in 7?, such that the difference of two 
consecutive points Wi^i — Wi belongs to the set of steps 6 and none of the points 
but the first lie on the half-line {(x, 0) : x < 0}. An example for such a walk with 
set of steps 

6 = {(- 1 , - 2 ), (- 1 , 1 ), (- 1 , 2 ), ( 1 , - 2 ), ( 1 , 1 ), ( 1 , 2 )} 

is shown in Figure 1. 

Recall that a generating function F{t) = Xln>o algebraic, if there is a 

nontrivial polynomial P in two variables, such that P(F{t),t) = 0. Otherwise, it 
is transcendental. 

In [2] Mireille Bousquet-Melou conjectured the following: 



Conjecture 1.1. Consider the generating function for walks in the slit plane with a 
given set of steps &, counted according to their length and their end- coordinates: 



S{x,y;t) 



E 



^length W^x-final U^^y-final W 



W walk on the slit plane 
starting at the origin 
with steps in & 



Suppose that the set of steps is not degenerated and thus all four quadrants of the 
plane can be reached by some walk, and that the greatest common divisor of the 
vertical parts of the steps is equal to one. 

Then this generating function is algebraic in t, if and only if the height of 
any step is at most one. 



In fact, she proved one part of this conjecture in Section 7 of the above paper, 
namely, that walks with steps that have height at most one have an algebraic 
generating function. Furthermore, in Section 8 she proved for one family of step- 
sets that the corresponding generating functions have to be transcendental. In the 
present paper, we prove the following: 



^Research supported by the Austrian Science Foundation FWF, grant S8302-MAT. 




50 



Martin Rubey 




Figure 1. A walk on the slit plane 



Theorem 1,2. Let IK and V be two finite sets of integers, the greatest common 
divisor of the integers in eaeh set being equal to one. Furthermore, assume that 
both of the sets IK and V contain positive and negative numbers, and that V contains 
an element with absolute value at least 2. Finally, assume that the minimum ofV 
is at least —2. 

Let & be the Cartesian product of the two sets: & = % x V, where IK is the 
horizontal and V is the vertical part of the steps. Then the following generating 
functions for walks in the slit plane with set of steps & are transcendental in t: 

• the generating function Si o{t) for walks ending at a prescribed coordinate 

(hO), 

• the generating function L{t) for loops, i.e., walks that return to the origin, 

• the generating function So(l;t) for walks ending anywhere on the x-axis, 
and 

• S{l,l;t), which is the generating function for walks ending anywhere in 
the slit plane. 

For example, the set of steps of the walk in Figure 1 is the Cartesian product 
of IK = {-1, +1} and V = {-2, +1, +2}. 

In fact we consider a slightly more general problem: we allow the steps in 
IK and V to be weighted with positive real numbers. The weight of a step in the 
product set © = IK X V then is the product of the weights of its corresponding 
vertical and horizontal parts and the weight of a walk is the product of the weights 
of its individual steps. 

As in [2] , we will use a special case of the following theorem to determine in 
which cases the generating function cannot be algebraic: 




Generating functions of walks on the slit plane 



51 



Theorem 1.3. [4] Let F{t) be an algebraic function over Q that is analytic at the 
origin, then its Taylor coefficient fn has an asymptotic equivalent of the form 

where s G Q \ {—1, —2, —3, . . t < s; j3 is a positive algebraic number and the 
Ci and uji are algebraic with \uJi \ = 1 . 

It follows easily that an algebraic function cannot have an appearance of a 
negative integer power of n anywhere in the full asymptotic expansion of its Taylor 
coefficients. 



2. An expression for the generating function for walks on the slit 
plane 

The fundamental theorem for walks on the slit plane is the following: 



Theorem 2.1. (Proposition 9 in [2]^ Let 



B{x]t) 



^x-final(W)^length(U^) 

W walk on 1? 
starting at the origin 
ending on the x-axis 
with steps in & 




be the generating function for bilateral walks, that is, walks that end on the x-axis 
but are otherwise unconstrained. 

For i > 1, the generating function Si^o{t) for walks on the slit plane ending 
at (i,0) can be computed by induction on i via the following identity: 



k—l ii-\-i2~\ Hk=i 

n>0,^2>0,...,ifc>0 



Note that it follows that \ogB{x\t) has positive Taylor coefficients. 

Now we can take advantage of the special structure of the set step ©. Since 
it decomposes into a horizontal and a vertical part, the generating function for 
bilateral walks factorises: 

B{x‘t) = B{H{x)t) 

where 

B{t) = ^ ^lengthen/) 

W walk on Z 
from 0 to 0 
with steps in V 

is the generating function for bridges and 

= E 

hexc 



is the step (Laurent-) polynomial for TC. 




52 



Martin Rubey 



3. Ttanscendence of [x^] log B{x\ t) 

In this section we show that [x^] log B{x] t) cannot be algebraic if the set V contains 
an element with absolute value strictly greater than one. 

To this end, we consider the asymptotic expansion of [t^x'^] \ogB{x; t). Since 
B{x'^t) factorises, we have 

[rx^]log B{x]t) = [t^x^]logB{H{x)t) 

= [x^]{H{x)rr]logB{t). 

Therefore, we have divided the problem in two; we will show that the asymp- 
totic expansions of both [x*] {H{x))^ and [t'^]logB{t) contain a term for 

some odd k. 



3.1. Asymptotics of the horizontal part 

Let min be the minimal integer such that H{x)x'^'^'^ is a polynomial. To deter- 
mine the asymptotics of [x^] (H{x))^ = (ii'(x)x"^^’^)’^, we can use the 

following theorem: 



Theorem 3.1 ([3, 5]). Let g{z) be an analytic function of degree d with positive 
coefficients assumed to be aperiodic and such that ^(0) ^ 0, and let a(z) be analytic 
except possibly at zero, where a pole is allowed. Let X be a positive number of some 
subinterval [Aq,A 5 ] of the open interval ]0,d[. Then, with N = [Xn\, one has 
uniformly for A G [Aa, A^] 



where ( is the unique positive root of the equation 



g'iO 

5(C) 



= A 



and 

R= ^ [log 5(C) - A log C]. 



Let p = gcd{fc + min : k G TC} and note that in the case p > 1, every 
p^^ coefficient of {H{x))^ will be zero. Thus, we take g{x) = and 

a(x) == x~'^!^ , and use the theorem to determine [x’^ "^^’^/^]a(x)(p(x))’^. In this 
situation, A == minfp is constant, therefore ( and R must be constant, too. Hence, 
the asymptotic expansion of [x*] [H(x))^ contains an appearance of 



3.2. Asymptotics of the vertical part 

To determine the asymptotic behaviour of [t'^] log B{t) we will use analysis of 
singularities. In a first step, we have to determine the singularities of the expression. 
Following the general theory of singularity analysis, all the contributions from these 
singularities must be added up. 

According to the remark after Theorem 2.1 and the factorisation (1), we have 
that [t'^] \ogB{t) is always positive. Therefore we can apply Pringsheim’s theorem: 

Theorem 3.2. [6] If a function with a finite radius of convergence has Taylor coef- 
ficients that are nonnegative, then one of its singularities of smallest modulus ~ a 
dominant singularity - is real positive. 




Generating functions of walks on the slit plane 



53 



Since the logarithm is singular only at the origin, and B{t) is strictly pos- 
itive for positive real numbers, Pringsheim’s theorem impli^ that one dominant 
singularity of log B{t) is in fact a singularity - call it p - of B{t). Of course, there 
can be other dominant singularities of log B{t) arising from singularities of B{t). 
We will discuss them below. 

Furthermore, it might happen - although we believe that it does not - that 
B{t)_ has zeros on the circle around the origin with radius p, thus also making 
log B{t) singular. However, such singularities can never contribute a summand of 
order for odd k, so we can simply ignore them. Note that B{t) cannot vanish 
for |t| < p, since p is a dominant singularity of log B{t). 

Now we want to compute the contribution of the dominant singularities of 
B{t) to the asymptotic expansion of log B{t). To do so, we need a better under- 
standing of B{t), which is the generating function for bridges with step set V. 

Luckily, this generating function has already been studied. We define the step 
(Laurent-) polynomial for V as 



V{y) = J^y^ 

vEV 



and the characteristic curve determined by V by the equation 

1 — tV{y) =0 or equivalently = t{y'^'^'^^ V {y)) = 0, (2) 

where miUy = — min V is the minimal integer to make the equation polynomial. 

We say that the functional equation (2) is reduced, if the greatest common 
divisor of the exponents of the monomials in V {y) is equal to one, which is one of 
the assumptions in our main Theorem 1.2. 

We say that the functional equation (2) has period p, if the greatest common 
divisor of the exponents of the monomials in V (y) is equal to p. 

_ As is well known, the period is also the number of dominant singularities 
of B{t), which are all conjugate to the real dominant singularity p. In our case 
however, it can be seen ([1, Section 3.3]) that the asymptotic formula for bridges 
is obtained from the asymptotic expansion derived from the singularity at p by 
multiplying with p. Since we are only interested in the presence or absence of a 
term for some odd k in the asymptotic expansion, we can assume from now 

on that the functional equation (2) is aperiodic, i.e,, has period one. 

It can be seen [1, 6] that the solutions of this functional equation organise 
themselves into “small” and “large” branches. Here, “small” means that the solu- 
tion y{t) tends to zero as t tends to zero, whereas “large” means that y{t) tends 
to infinity as t approaches zero. 

It is only the set of “small” solutions that is interesting for us, and it can be 
seen - using a limit case of Pellet’s Theorem, see for example [7] - that there are 
miriy of them. A nice expression for the generating function for bridges is given 
by the following theorem: 

Theorem 3.3. [1, Theorem 1 and proof of Theorem 3] The generating function for 
bridges is an algebraic function given by 




54 



Martin Rubey 



rmn„ rj^\ , 

B{t)=t ^t-^og{yi(t)y2{t)---ymiuAt)) 

= — f # 

2TTi y{l-tV{y))’ 

where the expressions involve all the small branches ^ 1 ,^ 2 ? - • • ^ymin^ of the char- 
acteristic curve (2), and p is the radius of convergence of B{t). Furthermore, the 
principal branch yi{t), i.e., the branch with real coefficients, has a square root sin- 
gularity at p and the product of all the other small branches is analytic for all 
t with \t\ < p. More precisely, p is given by p = 1/V{r), where r is the unique 
positive number with V'{r) = 0. 

Thus, applying the Newton-Puiseux theorem, we can develop B{t) around 
the singularity p, setting i = y/p — t: 

B(t) = O—iji + (3-0 T a\i + a 2 p + • • • 

Composing this expansion with the Taylor expansion of the logarithm we 
obtain 

\ogB{t) = Ioga_i/t + log(l + -^i + H ). 

(2_1 (2—1 

Now we want to find a term for some odd k in the asymptotic expansion 

of the coefficients of the above series. Clearly, 

[t”]loga_i/t = [r]loga_i/Vp- i ~ cop“"r^“^ 
for some constant cq, is not what we are looking after. However, we have 

a_i a_i a_i 

and 

(2—1 (2—1 

for some constant Ci. Provided that ao does not vanish, this term will guarantee 
transcendence of the generating function for walks on the slit plane. Thus we need 
the constant term in the singular expansion of B{t). 

Since the product of the non-principal branches y 2 {t)ys{t) . . -ymin^it) is an- 
alytic and non-zero at p, the contribution of t-^ log{y 2 {t) • • • 
stant term in the singular expansion of B{t) around p is the sum of the residues 
of l/y(l - pV{y)) at the zeros of 1 — pV{y) that are strictly smaller than r in 
modulus. 

To obtain the contribution of t log(yi(t)), we proceed as follows: 
[?*]^^log(j/i(0) 

= \^\{p-P){-^)jPog{yi{t)) 

= p[P]log(yi(0). 




Generating functions of walks on the slit plane 



55 



To obtain the coefficient of P in z = log(yi(t)), we consider the Taylor expansion 
of 0 = G{t, z) = l — tV{e^) around (p, logr), where r = yi{p)- We set z = z — \ogr 
and write G short for G(p, logr), subscripts denote the partial derivative: 



0 = G{t,z) 

= G — GtP + GzZ — Gt^zP ^ 

since Gt^t = 0. We have 

G = 0 






+ . . . , 



G. = = -i 

G. = = -lyrV'iT) = 0 

Gt,t = 0 

G,,= -e^FV)l(t=,,.=iog.)=0 

G,,, = + = -pT^V"{r) 

Therefore, substituting into the Taylor expansion z = at (5P + 0(P) we obtain 

0 = —GtP + ~Gz,zP + -^Gz^z,zP + G{P) 

= (^~Gt + -;^Gz^z(P^ P + ^z,z(^l3 + qGz^z,zO[^^ P + 0{P). 

Thus 




F"(r) 

+ SV"{t) 
S{prV"{r))^ 



(3) 



It is easy to check that /?, i.e., the coefficient of P in the singular expansion 
of log{yi{t)) is exactly one half of the residue of 1/^(1 - pV{y)) at r. 

In summary, we have shown the following: 



Theorem 3.4. Consider bridges with set of steps V. Let yi (t ) , 2/2 , • • • , 2/mm^ (t) be 

the solutions of the functional equation I — t 2/^=0 that tend to zero as t 
goes to zero, y\ (t) being the branch with real positive Taylor coefficients. Let r be 
the unique positive real number with V'{r) = 0 and let p = Furthermore, let 
n = yk{p) for k e {2,3, ..,miny}. 

Then the constant term in the singular expansion of the generating function 
for bridges is 



1 1 

r^"^=^y{l-pV{y)) 



miriv 

fc =2 



1 

y{l-pV{y))' 



(4) 



Unfortunately, we were not able to show that this expression does not vanish 
if the step set contains a step of height strictly greater than one. However, we have 
the following conjecture, that we are able to prove partially for some special cases: 




56 



Martin Rubey 



Conjecture 3.5. Let V{y) be a Laurent-polynomial with positive coefficients with 
highest exponent equal to maxy and lowest exponent equal to —miny . Let r be the 
unique positive solution ofV\r) = 0 and 

min^ — l 

f<{y)= n iy-'^)= 

V{k)=V{t) k=0 

\k\<t 

and 

max^- 

f>{y)= n (y-^)= £ 

V{k)=^V{t) k=0 

\k,\>t 

Consider the decomposition 
1 

y{V{y) - V{r)) 

where the degree of p<^ is miny — 2 and the degree of py is maXy — 2. 

Then 



bkz'^. 



^ a + l3y p<{y) py{y) 

{y-T-y f<{y) fy{yY 



0 ■< Gq ■< Ui • <C Gyniny — l (^) 

0 bmaXy~l bxnax^ — 2 ^ ^ ^0) (^) 

the leading term of p< is negative if miny > 1 and the leading term of p> is 
positive if max y > 1. 

The negativity of the leading term of p< in the conjecture would already imply 
that the constant term of the singular expansion of B[t) around p does not vanish: 
~ p(f exactly the value given by (4). Since replacing yhy \/y 

in the Laurent-polynomial V{y) changes the sign of P in the decomposition (5), 
we can assume that /3 is negative or zero. In fact, if miny = 1, it follows from (3) 
that P is negative. Since [y'^'^^^~‘^]p<{y) < 0 for miny > 1, the claim follows. 

We can prove parts of the conjecture for min^, < 3: In general, it is easy to 
see that the product of /< and /> has positive coeflBcients and that both of their 
constant terms must be positive. 

If miny < 2 we can show (6) and (7) by inductive arguments. We were also 
able to check the case miny = 3 and maXy < 4. 

If miny = 2 we can also show that < 0: in this caise, p<(y) is 

constant and equals 1 / (t 2 V' (t 2 ) ) , where T 2 is the only negative zero of ^(y ) = V{r) 
which is smaller than r in modulus. Since V{y) tends to infinity as y approaches 
0— , we have that V'{t 2 ) > 0, which implies the claim. 

Finally, if miny = 2 and the coefficients of V are either zero or one, we can 
also conclude that P is negative: in this case the numerator of (3) equals 



-ry"'(r) - 3F"(r) + 3r“ V'(r) - -3a_ir“^ + 3air"^ - 15a3T -h . . . 




Generating functions of walks on the slit plane 



57 



If ai = 0 then the above expression is trivially negative. Otherwise we have to 
show that 3r“^ < 15r. We show that r > |, which is sufficient: We have 



V\y) = -2y-^ + Y, ka,y’^~" 

k> — l 



< 



-2y-^ + 



1 



which is negative for y < 



4. Transcendence 



It is now a simple matter to complete the proof of the main Theorem 1.2: Since 
in the circumstances of the theorem the asymptotic expansion of 



log B{x; t) = [x^] (H{x))'^ [^] log B{t) 



contains a term n“^, the series logjB( 2 r; cannot be algebraic. When i is 

minimal such that there is at least one walk in the slit plane with steps in 6 
ending at (i,0), Theorem 2.1 gives that [x^]logB{x;t) is the generating for such 
walks. To settle the transcendence of Si^o(t) for general z, we only need to note that 

log5(x; t) ~ cop~'^n~^, where, as we proved in the last section, cq = ~ 



and thus does not vanish. Hence, the leading term of log B{x] t) 

contains a factor of . Thus, in the convolution formula for term 

l/ii? in the asymptotic expansion of [x'^]log B{x;t) cannot be cancelled by terms 
of the asymptotic expansion of the product of two or more functions Si.^Q. 

The proof of the non-D-finiteness of the other functions can be copied ver- 
batim from the proof of Proposition 22, page 282 of [2]. 



5. Acknowledgements 

I would like to thank Michael Drmota, Bernhard Gittenberger, Bernhard Lamel 
and Bodo Lafi for numerous stimulating discussions concerning the nature of the 
solutions of the functional equation (2). Also I am very grateful to two anonymous 
referees who pointed out numerous mistakes and a wrong conjecture appearing in 
the manuscript. And, of course, I would like to thank Mireille Bousquet-Melou for 
introducing me to the problem and for a wonderful stay in Bordeaux. 



References 

[1] Cyril Banderier and Philippe Flajolet, Basic analytic combinatorics of directed lattice 
paths, Theoret. Comput. Sci. 281 (2002), no. 1-2, 37-80, Selected papers in honour 
of Maurice Nivat. MR 2003g:05006 

[2] Mireille Bousquet-Melou, Walks on the slit plane: other approaches, Advances in Ap- 
plied Mathematics 27 (2001), no. 2-3, 243-288, Special issue in honor of Dominique 
Foata’s 65th birthday (Philadelphia, PA, 2000). MR 2002j:60076 

[3] Michael Drmota, A bivariate asymptotic expansion of coefficients of powers of gener- 
ating functions, European Journal of Combinatorics 15 (1994), no. 2, 139-152. MR 
94k:05014 




58 



Martin Rubey 



[4] Philippe Flajolet, Analytic models and ambiguity of context-free languages, Theoret- 
ical Computer Science 49 (1987), no. 2-3, 283-309, Twelfth international colloquium 
on automata, languages and programming (Nafplion, 1985). MR 89e:68067 

[5] Philippe Flajolet and Robert Sedgewick, The average case analysis of algorithms, 
1994. 

[6] Einar Hille, Analytic function theory. Vol. I, II. 2nd ed. corrected., Chelsea Publish- 
ing Company, 1973. 

[7] Morris Marden, Geometry of polynomials. Second edition. Mathematical Surveys, 
No. 3, American Mathematical Society, Providence, R.I., 1966. MR 37 #1562 

Martin Rubey 

LaBRI, Universite Bordeaux I, Research financed by EC’s IHRP Programme, 
within the Research Training Network ‘‘Algebraic Combinatorics in Europe” , grant 
HPRN-CT-2001-00272 
martin . rubey@labri . f r 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Some Curious Extensions of the Classical Beta 
Integral Evaluation 

Michael Schlosser 



ABSTRACT: We deduce curious q-series identities by applying an inverse 
relation to a certain identity for basic hypergeometric series. After rewriting some 
of these identities in terms of q-integrals, we obtain, in the limit q ^ 1, curious 
integral identities which generalize the classical beta integral evaluation. 



1. Introduction 



Euler’s beta integral evaluation (cf. [1, Eq. (1.1.13)]) 

m > 0, (1) 

is one of the most important and prominent identities in special functions. In 
Andrews, Askey and Roy’s modern treatise [1], the beta integral (and its various 
extensions) runs like a thread through their whole exposition. 

An unusual extension of (1) was recently found by George Gasper and the 
present author in [4, Th. 5.1] and reads as follows. 



r(g)r(/3) 
r(a + /3) 



(c-ia + lf) [ 
Jo 

X 2^1 



^ (c-a(a + ^))^(c- (a+ l)(a + ^))^ ^ 



(c — (a + i)2)2/3 

a - /? - 1, -/3^ (g + t)t 
a 



c — a{a 



{1-tf-^dt, ( 2 ) 



provided 5ft(a), 5ft(/3) > 0. It is clear that (2) reduces to (1) when either c ^ oo or 
a oo. Two special cases of (2) where the 2 F 1 in the integrand can be simplified 
are a = /? + 1 and a = f3. Specifically, we have 



mm .. , .. 2 ^ f {c-aia + t)nc-{a + l){a + t)m 

2 F{ 2 p) ^ ^ Vo (c-(a + f) 2 ) 2/3 

(3) 

and 

,,,, 2. /■'(<: -a(0 + !))»-■(<; -(a + l)(a + »))'’-■ 
r(2/3) '“+‘''7. (c -(« + ()“)« 

X (c — (a — t){a + t)) t^~^ (1 — t)^~^ dt, (4) 

where in each case 5ft (/3) > 0. 

In an early version of [4] we claimed that the integral evaluations (3) and (4), 
proved by the same procedure as the integral identities in this paper, “seem to 
be difficult to prove by standard methods”. However, after seeing our preprint [4], 




60 



Michael Schlosser 



Mizan Rahman [7] communicated to us a remarkable proof of (3) which involves 
a sequence of manipulations of hypergeometric series [2]. 

Another beta-type integral evaluation which has some similarity to (2), is [4, 
Th. 5.2]. It reads as follows. Let m be a nonnegative integer. Then 



r(^)r(/3) 

2T{2(3) 

X 2F1 



= (c - (a + 

-13, -m. 



1)^) / 

Jo 



^ {c - a{a + 1))^ {c - {a + l){a + 1))^ ^ 



{a + ty 



(c- (a + 02)2/5 

I — et 



[ - 2/3 ’ {c-a{a + t)){l-et)\ 



(Vt) 



(5) 



provided 5ft(/3) > max(0, m — 1). Some special cases are considered in [4, Sec. 5]. 

In this paper, we generalize both identities (2) and (5), see Corollary 5.3 and 
Theorem 5.1, respectively. While (5) does not extend the classical beta integral 
evaluation (1), its extension in Theorem 5.1 now does. In order to deduce our 
results, we apply essentially the same machinery which was utilized in [4] with the 
difference that our derivation now makes use of a more general basic hypergeo- 
metric identity (namely, (6)). 

We start with some preliminaries on hypergeometric and basic hypergeomet- 
ric series, see Section 2. In the same section we also exhibit an explicit matrix 
inverse which will be crucial in our further analysis. This matrix inverse is ap- 
plied in Section 3 to derive a new g-series identity which we list together with 
some corollaries. In Section 4 we rewrite two of the obtained identities in terms of 
g- integrals. From these we deduce in Section 5, by letting g 1, new beta-type 
integral identities by which we generalize the results from [4] . 



2. Preliminaries 

2.1. Hypergeometric and basic hypergeometric series 

For a complex number a, define the shifted factorial 

(a)o := 1, (a)fc •= a{a -h 1) . • . (a + fc — 1), 

where fc is a positive integer. Let r be a positive integer. The hypergeometric r^r-i 
series with numerator parameters ai, . . . , a^, denominator parameters 5i, . . . , 5r-i? 
and argument z is defined by 

E {ai)k . ♦ . {ar)k k 

' ■ ‘ 

The rFr-i series terminates if one of the numerator parameters is of the form — n 
for a nonnegative integer n. If the series does not terminate, it converges when 

kl < 1 , and also when \z\ = 1 and 3?[6i + 62 H h &r-i — (ai -h H h Or)] > 0 . 

See [ 2 , 10] for a classic texts on (ordinary) hypergeometric series. 

Let g (the “base”) be a complex number such that 0 < |g| < 1. Define the 
q-shifted factorial by 

(a; q)oo ■- 11(1 “ (“; ■= /In’JiT 

for integer k. The basic hypergeometric r^r-i series with numerator parameters 
ai, . . . , a^, denominator parameters 61 , . . . , 6 ^_i, base g, and argument 2 is defined 



rFr — l 



a\ , . • • 5 Ur 
, . . . , bj’—i 







Some curious beta integrals 



61 



by 

(qi; g)fc • • • (dr; q)k 

^Q{Q-,q)k{bi-,q)k---{br-i-,q)k 

The r(t>r-i series terminates if one of the numerator parameters is of the form q~^ 
for a nonnegative integer n. If the series does not terminate, it converges when 
l^l < 1. For a thorough exposition on basic hypergeometric series (or, synony- 
mously, q-hypergeometric series), including a list of several selected summation 
and transformation formulas, we refer the reader to [3]. 

We list two specific identities which we utilize in this paper. 

First, we have the following three-term transformation (cf. [3, Eq. (IIL34)]), 



p(j)r—i 



CLl , . • • , CL-p 
6i , . . . , bp—\ 






302 



a, b, c de 
d,^ ’ abc 



[e/b] q) 

oo (e/c; q)c 



■ 3</*2 



d/a, b,c 
d, bcq/e^ ^ 



(e;^)oo(e/6c; q) 

(d/g; q)oo{b‘, q)oc{c', q)o^{de/bc\ q) 

(d; ^)oo(e; q)oo{bc/e', q)^{de/abc] q) ^ 
where |de/a6c| < 1. Further, we need (cf. [3, Eq. (III.9)]) 



302 



e/b,e/c, de/abc 
de/bc,eq/bc 



302 



a, b, c de 
d, e ’ abc 

where \de/abc\,\e/a\ < 1. 



{e/a',q)oo{de/bc- q)c 

(e;g)oo(de/a6c; q)oc 



302 



a,d/b,d/c^^ e 
’ a 



d, de/bc ~ 



( 6 ) 

(7) 



2.2. Inverse relations 

Let Z denote the set of integers and F = {fnk)n,kez an infinite lower-triangular 
matrix; i.e. fnk = 0 unless n > k. The matrix G = {gki)k,iez is said to be the 
inverse matrix of F if and only if 

fnkQkl = dnl 

l<k<n 

for all n,l eZ, where 5ni is the usual Kronecker delta. 

The method of applying inverse relations [8] is a well-known technique for 
proving identities, or for producing new ones from given ones. 

If (/nfc)n,fc€Z and {gki)k,iei are lower-triangular matrices that are inverses of 



each other, then 


^ ^ fnk^n — ^/e 


(8a) 




n>k 




if and only if 


"^9klbk = CLU 
k>l 


(8b) 



subject to suitable convergence conditions. For some applications of (8) see e.g. 
[6, 8, 9]. 

Note that in the literature it is actually more common to consider the fol- 
lowing inverse relations involving finite sums, 

n k 

fnkO-k = bn if and only if 'Y^gkih = ak- (9) 

fc=0 (=0 

It is clear that in order to apply (8) (or (9)) effectively, one should have 
some explicit matrix inversion at hand. The following result, which is a special 
case of Krattenthaler’s matrix inverse [6], will be crucial in our derivation of new 




62 



Michael Schlosser 



identities. It can be regarded as a bridge between g^-hypergeometric and certain 
non-^-hypergeometric identities. (For some other such matrix inverses, see [9].) 

Lemma 2.1 (MS [9, Eqs. (7.18)/(7.19)]). Let 



fnk 






{T,Q)r 



V c-a{a-\-bq^) ^^)n—k 



9kl 



(c- (a + («,*){« + ,»)) 

Then the infinite matrices {fnk)n,kez {gki)k,iez inverses of each other. 



3. Some curious qr-series expansions 

Proposition 3.1. Let a, b, c, d and e be indeterminate. Then 



{^T9)c 



(c - (a + l)(a -h b)) (c - (a + bq^Y) 



{b^^q; q) 



X 3<^2 






^ (c - (a + l)(a + bq’^)) (c - (a + b)(a + bq^)) 

e.q^,\jb'^ 

, (^’ 9 )k{d; q)k (e-g^+6q'») ’ ^)fc ( ^ c-a%+frg>=)~ ’^)oo 

(e/d; g)oo ( 1 / 6 ; g)oo (b^eg; q)^ ^ (c- (a+ l)(a + b)) 

(l/ 62 qr; q)^ {b‘^eq/d-, q)^ (e; q')oc. ^ (c - (a + l)(a + 

(c - (a + bq^Y) 



■ 3<f>2 



bn.b^eg/d, ‘•rvli'-i 

b‘^q^, b‘^eq^'^^ 



7 cnl 2 fc + l 



(c- (a + 6 ) (a + 6 ( 7 ^)) 

(6; g)fc(d;g)fc( jy^fej,) ;g) 



',q,Q 



{q-,q)k{bW,q)k 



beq 

~d 



( 11 ) 



provided |^eg/d| < 1 . 

Proof of Proposition 3.1. Let the inverse matrices {fnk)n,kGZ and {gki)k,iGZ 
be defined as in Corollary 2.1. Then ( 8 a) holds for 

_ {d;q)n (b^eqy^ 

(e;9)nV d ) 




Some curious beta integrals 



63 



and 



_ fb^eqY ’ ^)oo \e/d, 1/6, 



(e-,q)k \ d ) (b^q;q)^ ( (g ±A g ■ g) ^ [ e 9 ^ 1/62 

(e/d; g)oc (1/6; g)oo (62eg*^+i; g)oo ( 



;?>9 



+ 



(1/62^; g)oo (62gg/d; g)oo (e?''; q)oo { ^c-X+S^) 



jd; q)h 
(e; q)k 



Peq 

~d~ 



302 



bq, h^eq/d, ^ 



(q+bg^)g" 



2(a+6gfc).^ ^ 



by (6). This implies the inverse relation (8b), with the above values of and b]^. 
After performing the shift fc i-> fc + Z, and the substitutions a i-> ag^ c h-> cg^^, 
e 1-^ eg“^ we get rid of Z and eventually obtain (11). □ 



Corollary 3.2. Lei a, b, c, d and e be indeterminate. Then 



1 = 



(c- (a + l)(a + 6)) (c - (a + 6g'')2) {h-,q)k{d]q)k 



OO 

^ (c - (a + l)(a + 6g*)) (c - (a + 6)(a + bq^)) (g; q)k (e; q)k 

{a-\-hq^) 



X 302 



1/6, dg^, 



A: (g+bg^)g^ ,2 

c— a(a4-^g^) . ^ 



^ ’ c— a(a+^)g^) 



< c— a(a+6g^) ’ oo 
( {a+hq^)hq \ 
Vc— a(a+6g^) ’ oo 



b^Y 

d / 



(12) 



provided |6eg/d| < 1 and |6^eg/d| < 1. 



Proof. Apply (6) to the right-hand side of (11), with respect to the simulta- 
neous substitutions a ^ dq^, b 1/6, c i-^ (a -f- bq^)q^/{c - a{a -h 6g^)), d eg^, 
e 1-^ (a -h 6g^)6g^+ V(c “ ^(^ + bq^))- ^ 

Corollary 3.3. Let a, b, c, d and e be indeterminate. Then 



(e;g)oo {b'^eq/d; q) 

OO 

{be; q) 

OO {beq/d; q) 

OO 

^ (6; q)k{d; q)k , 
{q;q)k{be;q)k^^^ 



OO 



E 



(c - (g + l)(g + 6)) (c - (g + 6g'^)2) 

(c - (g + l)(g + bq^)) (c - (g + 6)(g + bq^)) 



1/6, bq, 



{a-\-bq^)bq 

c-a{a-\-bq^) 



beq/d, 



(g+bg^)bg^+ 
(g+6g^) 



i ; q, beq 



{a-\-bq^) 
c— g(g+bg^) 
{a-\-bq^)bq 
c— g(g+6g^) 





k 



(13) 



provided |6eg/d| < 1 and \be\ < 1. 

Proof. Apply (7) to the 3^2 on the right-hand side of (12), with respect to 
the simultaneous substitutions a 1-^ 1/6, 6 i-> dg^, c 1— > (a-f6g^)g^/(c — a(a-h6g^)), 
d 1-^ {a-\-bq^)bq^'^^ /{c-a{a-\-bq^)), e 1-^ eg^, and divide both sides of the resulting 
identity by (6e; g)oo(6eg/d; g)oo/(e; g)cx)(6^eg/d; g)oo- □ 

We will make use of Proposition 3.1 and of Corollary 3.3 in our derivation of 
new beta integral identities. 




64 



Michael Schlosser 



4. g-Integrals 

In the following we restrict ourselves to real q with Q<q<l. 
Thomae [11] introduced the g-integral defined by 



nl OO 

/ f{t)dgt = 

■'0 fc=o 



( 14 ) 



Later Jackson [5] gave a more general ^-integral which however we do not need 
here. 

By considering the Riemann sum for a continuous function / over the closed 
interval [0, 1], partitioned by the points fc > 0, one easily sees that 



lim 

q-^l- 



fmd,t= f 

Jo Jo 



f{t)dt. 



It is well known that many identities for g^-series can be written in terms of 
^-integrals, which then may be specialized (as ^ ^ 1) to ordinary integrals. For 
instance, the ^-binomial theorem (cf. [3, Eq. (II. 3)]) 



^ g)oo 

“ ( 9 ; (i)k ^ ~ {z; q)oo ’ 



\z\ < 1 , 



(15) 



can be written, when a ^ and z i-* q°‘, as 



where 




{qt; g)oo 
{q^t;q)^ 



^dqt = 



r,(a;) := {1 - qY ^ 



r,(g)r,(/?) 
r,(a + /3) ’ 



(g; g)oo 
(g^;g)oo 



(16) 



(17) 



is the qi-gamnia function, introduced by Thomae [11], see also [1, § 10.3] and [3, 
§ 1.11], In fact, (16) is a ^-extension of the beta integral evaluation (1). 

We will rewrite the identities in Proposition 3.1 and in Corollary 3.3 in terms 
of ^-integrals. These will then be utilized in Section 5 to obtain new extensions of 
the beta integral evaluation. 

Starting with (11), if we replace b by q^, d by eq^'^^^°‘, and multiply both 
sides of the identity by 

(e;g)oo 

(eg^+i“";g)oo’ 



we obtain the following g-beta-type integral identity: 



(e; g)oo _ Pg(2^ + 1) (c — (g + l)(g + q^)) 

(eg/3+i-«; q)^ ~ p,(/? + l)Tq{(3) ^ (c - (g + l)(g + qH)) 



( 18 ) 




Some curious beta integrals 



65 



{c-(a + qHf) 
{c-{a + qP){a + qH)) 



9 ’c-aia+qet).q^q 

et, q-^<^ 



{<lt',Q)oo {et;q)oo (e-g(t.%gf)i^)oo (^c-a(a+q<3t)*i^)oo f 

i,^t; qU {eq^^^-t; qU <l)^ ^)oc ' 

V,{-2p-l)T,{a + 0) (c-(a + l)(g + /)) 

r,(^) r,(o -(3-1) r,(-/3) io (c - (a + l)(a + qH)) 



(c-ia + qH)'^) ^ L/?+l (a+g^y+^t T 

(c-(a + g/3)(a + g/3t))3‘^ 1^ ^2/3+2^ g^2/3+l^ 

(gt; g)^ (eg^^+^t; q)^ ( ; g)^ 

Similarly, starting with (13), if we replace b by g^, d by eg^'''^~“, and multiply 



f (g+/t) . \ 

^c-a{a-\-q^t) ’ oo ,q:-1 






both sides of the identity by 



(l-g) 



(Q',Q) oo (eg^;g)c 



i<l'^-,Q)oo{eqd+'^~°‘-,q)oo' 
we obtain the following ^-beta-type integral evaluation: 

rq(a) Tq{(3) (e;g)oo ^ (c - (a + l)(a + g^)) 
Tg{a + P) (eg^+i-«;g)oo io (c - (a + l)(a + g/^t)) 



(c - (g + qH)^) 

{c - {a + qf^){a + qf^t)) ^ ^ 



-13 13+1 (a+/t)g° 

y 5^ 5 e(c-a(a+qr/3^)) ^ ^ 

a (Q+g^t)g^t ’ ’ 

^ ’ c— a(a+g^i) 



(9^; 9) oo {eqH] q) 



< (g+g^t) . \ / (g+g^t)g^+^t \ 

< c— a(a+g^t) ’ oo V c— a (a-\-q0t) ’^/oo ,a-l 



(a0fa) a) f (QH-g^t)t x q ' \ ) 

\Q 1 Q.)oo \ Q ihJoo \ c-a{a-\-q^t) ’ oo \ c—a{a-\-qf^t) ’ oo 



5. Curious beta-type integrals 

Observe that liniq_^i- Tq{x) = T{x) (see [3, (1.10.3)]) and 



lim 

q-^l- 



{q'^u-, g)oo 
(u;g)oo 



(1-w) 



for constant u (with \u\ < 1), due to (15) and its g ^ 1 limit, the ordinary binomial 
theorem. 

We thus immediately deduce, as consequences of our g-integral identities from 
Section 4, new beta integral identities. We implicitly assume that the integrals are 
well defined, in particular that the parameters are chosen such that no poles occur 
on the path of integration t G [0, 1] and the integrals converge. 

We first consider the beta-type integral identity obtained from multiplying 
both sides of (18) by 

r(^)r(/3 + i) 
r(2/3 + i) ’ 

and letting g 1~. 




66 



Michael Schlosser 



Theorem 5.1. Let 5R(a),3?(/3) > 0. Then 



mm 

2r(2/3) 



(1 - 



= {c-{a + lf) f 
JO 



^ (c - a(a + 1))^ (c - (a + l)(a + ^ 



X 2-f’l 



0 (c-{a + t)m 

a- (3-1, -/3_ c-(g + t)^ 

“2^ ’ {c — a{a + t)){l — et) 



{1 - m {1 - m dt 



^ 2r{2(3)T{-/3)T{a- (3-1)^ ^ 7o ^ 

^ + 1 , a + / 3 _ c — (g + 



(c-(g+l)(g + t))^ ^ 

(c - g(g + 0)^+1 ^ ^ 



2/3 + 2 ’ — g(g + f))(l — ei) 

X ( 20 ) 

Note that (20) can be further rewritten using Legendre’s duplication formula 



1 



r(2^) = ^22/5-ir(/3)r(/3 + i), 

V^T 



after which the left hand side becomes 

r(/3) 



(l_e)/ 5 +i- 



4^ r0!3+i) 

Clearly, (20) reduces to (5) if a — /3 — 1 = m, a nonnegative integer. 

Observe that (20) reduces to the classical beta integral evaluation (1) for 
e = 0 and c — > oo due to the Gaufi summation 

r(C) T{C- A- B) 



2 F 1 



C J 



V{C - A)V{C - BY 



where 5i(C' — A — B) >0, the reflection formula 

r{z)r{i-z} = ^^ 



Sin 7TZ 

where 2 ; is not an integer, and some elementary identities for trigonometric func- 
tions, such as 

. / X . / X . sin 

smix + y) -\- smix — y) = smx — . 

sm y 

Next, we have the beta-type integral identity obtained from (19) by letting 
y^l-. 

Theorem 5.2. Let 5R(a), 5R(/3) > 0. Then 

mm^ „x/3+i-a_,„ , -,^2^ r {c- {a+i){a+t)m 

{c-{a + tYY 

{1 - etY~‘^ m [1 - tm dt. (21) 



gM(l_e)^+i-“ = (c-(g + l)^) [ 
(a + /3) Jo 



T{a + ( 3 ) 

\~I 3 , (3 + l_ (ce - (1 + ge)(g + t))t 
a 



X 2F1 



c - (a + 

Clearly, (21) reduces to (1) when e = 0 and c ^ 00 . 




Some curious beta integrals 



67 



Corollary 5.3. Let > 0. Then 



n<^)m 

r(a + /3) 



(c - (a + 

X 2-Fi 



2 x (c - aja + t)f (c - (g + l)(a + 1))^ 

” Jo {c-{a + tyf0 

— /3, a — (3 — 1 ((1 + ae)(a — ce)t 
’ {c — a{a-\~t)){l — et) _ 



-1 



X 



1 — et 
1 - e 



/3+l-a 

(1 - dt. 



( 22 ) 



Proof. We apply the transformation [2, p. 10, Eq. 2.4(1)] 






A,B -z 

C ’1-z 



2^1 



A,C-B 
C ’ 






(23) 



valid for |z| < 1 and 5R(z) < | (conditions which we implicitly assume), to the 2 F 1 
on the right-hand side of (21) and divide both sides by (1 - □ 

Clearly, (22) reduces to (2) for e = 0. 

As in [4], we observe that by performing various substitutions one may change 
the form and path of integration of the considered integrals. In particular, using 
1 1 -> s/(s + 1) these integrals then run over the half line s G [0, 00 ). 



Acknowledgements 

We thank George Gasper for comments. 

The author was fully supported by an APART fellowship of the Austrian 
Academy of Sciences. 



References 

[ 1 ] G. E. Andrews, R. Askey and R. Roy, Special functions , Encyclopedia of Mathematics 
and Its Applications 71, Cambridge University Press, Cambridge, 1999. 

[2] W. N. Bailey, Generalized Hypergeometric Series., Cambridge University Press, Cam- 
bridge, 1935; reprinted by Stechert-Hafner, New York, 1964. 

[3] G. Gasper and M. Rahman, Basic Hypergeometric Series^ Encyclopedia of Mathe- 
matics and Its Applications 35, Cambridge University Press, Cambridge, 1990. 

[4] G. Gasper and M. Schlosser, Some curious q-series expansions and beta integral 
evaluations, preprint arXiv:math.C0/0403481. 

[5] F. H. Jackson, “On ^-definite integrals,” Quart. J. Pure Appl. Math. 41 (1910), 
193-203. 

[6] C. Krattenthaler, “A new matrix inverse,” Proc. Amer. Math. Soc. 124 (1996), 47-59. 

[7] M. Rahman, private communication, April 2004. 

[8] J. Riordan, Combinatorial identities, J. Wiley, New York, 1968. 

[9] M. Schlosser, “Some new applications of matrix inversions in Ar,” Ramanujan J. 3 
(1999), 405-461. 

[10] L. J. Slater, Generalized Hypergeometric Functions, Cambridge University Press, 
Cambridge, 1966. 

[11] J. Thomae, “Beitrage zur Theorie der durch die Heinesche Reihe . . .,” J. reine ange- 
wandte Math. 70 (1869), 258-281. 




68 



Michael Schlosser 



Michael Schlosser 

Institut fiir Mathematik der Universitat Wien, Nordbergstrafie 15, A-1090 Wien, 
Austria 

schlosse@ap . univie . ac . at 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Divisor Functions and Pentagonal Numbers 

Klaus Simon 

ABSTRACT: Let p(n, m) be the number of partitions of n with at most 
m summands^, cj(n) = ^ (3 — n), n E Z be the pentagonal numbers^ and 
J ^ divisor functions. Then cro(n) — the number of 

the divisors ofn — satisfies 

(^o{n) = ^(n,0)+^(n,l) + --* + ^(n,n-l) (1) 

where 

oo 

g{n,m) — p{n, m) - (p(n - w(i), m) + p{n - w(-i),m)) . 

i=l 



Pentagonal numbers are given by a well-known identity due to Euler 



{q)oo =' = E 



( 2 ) 



They are correlated with the number of partition p{n) of n G N, generated by 






n=0 



1 

(q)co 



( 3 ) 



through the identity 

1 = = {1-q^ -q^ + q^ + q'^ ) • (p(0) 9° +p(l) + • • • ) 

or equivalently 



0 = p(n)-^(-l)^ ^ (p{n - U>{j)) + p{n - w{-j))y (4) 

i=i 



On the other hand, the pentagonal numbers are connected with the divisor function 
(Ji(n), for instance, by^ 



i=l 



^or equivalent: ... in which no part is greater than m, see G. Andrews, Number Theory, Dover, 
1994 

M0),u;(l),^(-l),a;(2),a;(-2),... = 0, 1, 2, 5, 7, . . . 

^see G. Walz (ed.), Lexikon der Mathematik, Spektrum Akademischer Verlag, Heidelberg, Berlin, 
2002 




70 



Klaus Simon 



Our statement (1) is a similar identity for ao{n). By way of illustration, for n = 5 

we obtain —2 = — o-o(5) = 5'(5,0)H h^(5,4) = 1-f- 0 — 1 — 1 — 1. The proof 

of (1) is based on^ 

oo 

lim (n-a„) = y'cTo(i)g* (5) 

n— )-oo ^ ' 

i—1 

where the sequence is defined by uq = 0 and 



dn — 1 T (1 Q ) Clfi—i. 



(6) 



Iterating the recurrence leads to 






n—1 n—1 ^ 



( 7 ) 



Now, the product ((1 — g) • • • (1 — q^)) ^ (q)^ is well-known as generating 

function of the numbers p{n,m), hence 



Y^p{n,m)q^ = jrz 



n=0 






m > 0. 



( 8 ) 



With (8) the equation (7) can be written as 

n—1 oo 

On = XI . (9) 

m=0 h=0 

^ V ^ 

Hn.mig) 



For n — > 00 we observe 

oo oo oo 

'^g{n,m)q^ = \im Hn,m{q) = {-If q^ (10) 

n=0 i=—oo j=0 

with ^(0, m) = 1 and 

^(n, m) = p(n, m) — p(n — 1, m) — p{n — 2, m) -f- p{n — 5, m) -h • • • , m > 1. 

( 4 ) 

Note that p{n,m) = p(n), for m > n. This implies g{n^m) = 0, for m > n > 1. 
Therefore 

oo 

lim (n - a„) = V (g(j, 0) + • ■ ■ + g{j,j - 1)) (11) 

which, together with (5), completes the proof of (1). 



Klaus Simon 



Swiss Federal Laboratories for Materials Testing and Research (EMPA) 

Lerchenfeldstrasse 5, CH-9014 St. Gallen 

klaus.simon@empa.ch 



^see G. Andrews, D. Crippa and K. Simon, q-Series Arising from the Study of Random Graphs, 
SIAM Journal on Discrete Mathematics, 10:41-56, 1997 




Part II 

Graph Theory 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



On Combinatorial Hoeffding Decomposition 
and Asymptotic Normality of Subgraph Count 
Statistics 

Mindaugas Bloznelis 



ABSTRACT: Given k and n, consider a graph with k vertices and n ”blue’^ 
edges. We assume that the set of ’’blue’’ edges is uniformly dis- 

tributed among n-subsets of N = ( 2 ) pairs of vertices. Given a graph S, the num- 
ber 3Sfg of blue copies of S is a U -statistic based on random sample Xi, . . . , X^. 
We show how the combinatorial Hoeffding decomposition of the random variable 
Ng can be applied to establish the asymptotic normality of'Hc as k^n 00 . 



1. Introduction 

Given a complete graph based on k vertices letX = {xi,...,Xjv} denote the set of 
edges. Let {Xi, , . . , X^} C X be a random n-subset uniformly distributed over the 
class of n-subsets of X. Here n < N. We paint edges Xi, . . . , X^ blue. The graph 
based on k vertices and (random) blue edges is denoted by G(fc, n). Given a graph 
S let Ng = n) denote the number of copies of S in G(fc, n). We are interested 
when the random variable (Ng— E3Sfg)/o'(3sfg) is asymptotically standard normal 
as fc,n ^ 00 . Here ct^(Ng) denotes the variance of Kg- 

Another random graph model assumes that edges become blue independently 
with probability t G (0, 1). Let i/i, . . . , z/jv independent Bernoulli random vari- 
ables with success probability i.e., Pjz/i = 1} = 1 — P{z>'i == 0} = t for every i. 
We paint the edge Xi blue if Ui = 1. The graph based on k vertices and (random) 
blue edges is denoted by G\k^t). It is called Bernoulli random graph. Given a 
graph S let DNfg = Ng(/;;,t) denote the number of copies of S in G\k^t). Note that 
the conditional distribution of Ng given the event + • • • + ^ coincides 

with the distribution of Ng. Therefore, the problems of the asymptotic normality 
of distributions of 3\fg and Ng are closely related. 

The asymptotic normality for Ng as /c 00 was studied by several authors 
using different methods (method of moments, Stein’s method, projection method 
and martingale limit theorems). An overview of the results and methods is given 
in the book of Janson, Luczak and Rucinski (2000). Let us mention that the first 
complete description of the conditions that are necessary and sufficient for the 
asymptotic normality of Ng was given by Rucinski (1988). 

The asymptotic normality of Ng is shown in Janson (1990). He consider 
the random graph process {G{k, t), t G [0, 1]}, where for every t the random graph 
G{k, t) is defined as above, but with Ui = Vi{t) = ^{ui<t } • Here 1 x 1 , U 2 , . . . denote in- 
dependent random variables uniformly distributed in [0, 1]. For every A; = 1, 2, ... , 




74 



Mindaugas Bloznelis 



the collection of random variables G [0,1]} can be viewed as a ran- 

dom process with sample paths in the Skorokhod space D[ 0 , 1]. Using a martin- 
gale convergence theorem Janson (1990) proved a functional limit theorem for the 
sequence of random processes {3Sfg(fc, G [0,1]}. Substitution of the random 
time tk^n = min{^ : + • • • + I'Nit) = ^} gives the asymptotic normality of 

tk,n)- Since the distributions of tk,n) and Kg coincide, this implies the 
asymptotic normality of Kg. 

The present paper proposes another approach to the asymptotic normality 
of Kg- We illustrate this approach by showing the asymptotic normality of the 
simplest subgraph count statistic: the number Kp 2 of 2— stars (S = ^ 2 )- Denote 
? 2 * = min{rz, N — n}. 

Proposition 1 . 1 . Assume that cr^(Kp 2 ) ^00 as k,n* 00 . Then the distribution 
o/(Kp2 — EKp2)/cr(Kp2) is asymptotically standard normal 

The proof combines projection’s method and Stein’s method. By means of 
Hoeffding’s decomposition, the random variable Kg is expanded into a sum of 
mutually uncorrelated U statistics of random variables Xi,. . . The decompo- 
sition enables us to write the characteristic function f{t) of (properly standardized) 
random variable. Kg in Erdos-Renyi form, see Erdos and Renyi (1959). Finally, we 
show that 

f'{t)+tf{t) -^0 (1) 

as ^ 00 . This implies the asymptotic normality, see Stein (1970) and Tikho- 
mirov (1976, 2001). 

The remaining part of the paper is organized as follows. In Section 2 we con- 
struct Hoeffding’s decomposition for three examples of subgraph count statistics: 
the number of 2— stars (S = P 2 )? the number of triangles (S = and the number 
of 4— cycles (9 = C4). In section 3 we write Erdos-Renyi representation for the 
characteristic function / (see (11) below) and prove Proposition 1. 



2. Decomposition 

Let (Xi , . . , ,Xn) denote the random permutation of the sequence xi,. . . A 
real function t{xi^ , . . . , Xi^) defined on n— subsets {xi^ , . . . , Xi^} C X defines the 
random variable T = t(Xi, . . . , X^). We assume that the variance <j^(T) > 0. 
Hoeffding’s decomposition expands the random variable T in the series of U— 
statistics 

T = ET“}~UiT‘''T Uu* ’j ri =■ min}? 2 , N — tz}. (2) 

Here Ur = . . . ,Xi^). The function Qr is defined on r— sub- 

sets of X and satisfies E^r(Ai^, . . . ,Xi^) = 0. Furthermore, for every s < r and 
every 1 < ii < - •< ir < N and 1 < ji < • • • < < X we have almost surely 

The kernels , . . . , x^^) are linear combinations of conditional expectations 

E(T|Xi =yi,...,Xj = yj) for 1 < j < r and {yi,...,yj} c {xi^, . . . ,Xi^}. Basic 
facts about the decomposition and formulas defining Qr can be found in Bloznelis 
and Gotze (2001) and Bloznelis (2003), see also Zhao and Chen (1990). Note that 
(3) implies that random variables Ur are mutually uncorrelated. This yields the 
variance decomposition 



a^{T) — (j^(C/i) + • • • + cr^{Un*)- 




Decomposition of subgraph count statistics 



75 



Furthermore using (3) it is easy to show, see Bloznelis and Gotze (2001), that 



/n\ /N-n\ nrr 

_2/rr \ \rJ \ r J ^2 ( 

(T [Ur) — 



(4) 



as k^n* oo. Here 



P = 



n 



N' 



q = 



N-n 

1 ^' 






2 .1. Let T denote the number of blue copies of 2— star P 2 in G{k,n). Given 
two edges x, 7 / G X let Lxy be the indicator of the event that x and y are incident. 
We have T = Xli<i<j<n '^^ 1 ^ 2 * Hoeffding’s decomposition 

T = ET + Ui + U 2 , Ui=0, U2= (Lx.x.-Pl). (5) 



Here pl := ^Lx^Xj = 2(fc - 2)/{N - 1) and <rf = pl{1 - Pl)- 
The variance 



a‘^{T) = a^{U2) = 



( 2 ) (^2 " ) _2 (m) 



<^2 



as /u, n* 00 . 



( 6 ) 



2.2. Let T denote the number of blue copies of the triangle Ks in G{k,n). 
Given three edges x,y,z e X, let a[x,y,z) be the indicator of the event that x,y 
and 2 : make up a triangle. We have T = Hoeffd- 

ing’s decomposition 

T = eT + [/i + ;72 + C/3, (7) 

2 

9i = 0, 92 {x,y) = ^ _^ {^xv-Pl), 

9^{x^ 2/j -2^) “ d{x^ y, z) Pa ^ {j-'xy "t" ^xz "b ^yz ^Pl) * 



Here Pa := Ea(Xi, X 2 , X 3 ) 



Using (3), (4) we obtain as k,n* 



00 



\U2) 



p^q^ 



'■m 



(pqf 



\T) 



{pq? 



(3^ + l)fc^ 



( 8 ) 



2.3. Let T denote the number of blue copies of the cycle C 4 in G(fc, n). Given 
edges x^y^z^w E X, let b(x,y,z) be the indicator of the event that x^y and 2 : 
make up a path and let d[x, y, 2 , w) be the indicator of the event that x, y, 2 : and w 
make up a cycle. We have T = 'Ei<i,<i^<i^<u<n^i^ii’^i 2 y^i 3 ^^u)- Hoeffding’s 




76 



Mindaugas Bloznelis 



decomposition 



T 

9i 

gs{x,y,z) 

gi{x,y,z,w) 



Here we denote 
Q{xy} 

Q{xyz} 

Furthermore, 



ET + U 1 + U 2 + U 3 + U 4 , 

^ . , /n-2\N-2N-S 

= 0 , 92{x,y) 



N-4N-5 



Q{x,y}^ 



N -S 



= d{x,y,z,w) -pd~ 



N-3 

N -6 



E 



Qa 



A(Z{x ^\A\=2> 



N-2N-3 

N-4N-5 



E 



Qa- 



Ac{x,y,z,w},\A\=2 



{^xy Pl)) 



k - 5 

(V) 

^ : {Kx, y, z)-pb) - ^ 



N-3 



N-4 



E 



Qa- 



Pd:=Ed(Xi,X2,X3,X4)=3-^, , 



i2. 

O" 



AC{x,y,z),\A\=2 



Pb = E6(Xi,X2,X3) = 12 



( 2 . 



Using (3), (4) we obtain as fc,n* 00 



^2(^2) r.t^k\ a\U3) 



- k , 



(r\Ui) 



4 4 



(9) 



2 . 3 . Examples suggest that the linear part of the decomposition of Ng van- 
ishes, i.e. Ui = 0 almost surely. This can be easily shown for arbitrary 9- Further- 
more, different U— statistics contributing to the Hoeffding’s decomposition (2) 
can have the same stochastic order even in the case where the parameter p = p{k) 
is bounded away from 0 and 1, i.e., for some £ > 0, 



s < p{k) < 1 — s as k^n* ^ 00 . ( 10 ) 

Thus, we have cr‘^{Uj) = 0(fc^) for j = 2,3 in (8) and = 0(fc^) for j = 3,4 

in (9). Similarly, Hoeffding’s decomposition of the number of blue copies of 
the complete graph i^ 4 , is the sum EN /^4 + U 2 + • • • + Uq, where a‘^(Ui) = Q(k^) 
for i = 2,3, and cr‘^{Ui) 0(fc^) for i = 4, 5,6 provided that (10) holds. We do 
not present formulas of various parts of the decomposition of here because 
the notation becomes awkward. 

One may expect that, for a large class of graphs 9 , the leading U — statistics 
of the decomposition {U —statistics having largest variances) of Ng correspond to 
the components P 2 and Ks, i.e., they are U 2 and (in the case where Ks C 9 ) U 3 . 



3. Asymptotic normality 

Recall that vi,. , . denotes a sequence of independent Bernoulli random vari- 
ables with the same success probability (say) p = V{vi = 1} = 1 — P{ui = 0}. We 




Decomposition of subgraph count statistics 



77 



shall assume that this sequence and the random permutation (Xi,...,Xiv) are 
independent. 

3.1. Let us write the characteristic function of the distribution of T in the 
Erdos-Renyi form using the decomposition (2). Denote 

Uk = ^ ^ 9k j • • • 7 ^ik * • • ^ik ’ 

l<ii<"‘<ik<N 



Replacing the factors by Wi = {yi — p) in the formula of we obtain the 
random variable 11^- Note that (3) implies almost surely. Denote S = 

wi^ \-wn^ 

The distribution of T - E T coincides with the conditional distribution of the 

sum 

+ [/*. T* 

given the event {5 = 0}. Therefore, the characteristic function 

Eexp{jt(r-ET))=2_^pj^-^ 



/: 



E exp{itT* + isS}ds. 



(11) 



This way of representing the characteristic function of a linear statistic, like ?7i, 
was used by Erdos and Renyi (1959). 

3.2. Here we prove Proposition 1. Denote f{t) = E exp{it{T - ET)/cr(T)}. 
In view of (6) it suffices to show that the relation p^q^k^ oo implies for every 
t ^ R that f\t) + tf{t) — > 0 as A:,n* — > oo. 

It follows from (5), (11) that 



fit) 

H 



{ f Ee'-^ds, J = tH + sS, A = 27tP{S' = 0}, 

^ J —7T 



^ ( hijt tlij — QijWiWj, Qij 

l<i<j<N 

Furthermore, by symmetry, we obtain 



a{T) 



[LxiXj ~Pl)- 






EHe^'^ds = i- 



A 



/ 

J — 7T 



Ehi2e'^'^ds. 



(12) 



Split 5 = 5* + 5o and H = h \2 y Hq where 5^^ = t£;i -}- W 2 and 



^0= E 



w 



3-> 



^<j<N 



Y. + Ho= Y 

3<j<N 3<i<j<N 



Expanding in powers of ithi 2 ^ and isS^ we show that 





- ithi2e^'^^)ds 



0, 



(13) 



where Jq = tHo + sSq. Therefore, we replace Ehi 2 e^'^ by itp‘^q^Egl 2 ^'^'^^ in the 
right integral (12). 

Similarly, expanding in powers of zthi 2 , itH^ and isS^ we show that 



(I) 

A 



pV / Egf2ie^'^° -e^-^)ds 

J —TV 



( 14 ) 




78 



Mindaugas Bloznelis 



Using (13) and (14) we replace Ehi 2 c'^'^ by itp^q^Eg‘l 2 ^^'^ in the right integral (12). 
Furthermore, by symmetry, we can replace (^)E^^ 2 ^^*^ hy = a^Ee^*^, since 

the number o? := J]i<i<j<iv non-random. Finally, invoking (4) we obtain 

2 2 2 ^{^XiX2 ~ Pl)^^ f 2 2 1 , /at \ 

^ ^^^T) \2j^^ =1 + 0{1/Npq). 

Since 0{l/NpqX) = 0{l/^Npq) we can replace p'^q^a^Ee^'^ by Ee^'^. We have 
shown that 

/iV\ -7T . pn 

/ Ehi2e^‘^d5 = -- / Ee^'^ds-ho(l) 
thus completing the proof. 

The proof of (13) and (14) is rather technical and laborious. We do not 
present it here and refer to an extended version of the paper (Bloznelis 2004). Let 
us mention that in the proof of (13) and (14) we apply techniques developed in 
Bloznelis and Gozte (2002) for the analysis of the accuracy of the normal approx- 
imation of U— statistics based on samples drawn without replacement, see also 
Bentkus, G5tze and van Zwet (1997), Helmers and van Zwet (1982). 

3 . 3 . Note that the orthogonal decomposition (projection method) was used 
by Janson and Nowicki (1991) to prove limit theorems for subgraph count statistics 
of Bernoulli random graphs. The present paper can be considered as an attempt to 
extend these techniques to subgraph count statistics in the random graph model 
G{k,n). In contrast to Bernoulli graph case the subgraph count statistics studied 
here have decompositions with vanishing linear part. Therefore, known results on 
the central limit theorem for asymptotically linear statistics based on samples 
drawn without replacement (see e.g., Bloznelis and Gotze (2002), Zhao and Chen 
(1990)) are not applicable. We show that Erdos-Renyi (1959) representation (11) 
combined with Stein method (1) can be used to establish the asymptotic normality. 



References 

[1] Bentkus, V., Gotze, F. and van Zwet, W. R. (1997) An Edgeworth expansion for 
symmetric statistics, Ann. Statist. 25 , 851-896. 

[2] Bloznelis, M. (2004) Some results on the orthogonal decomposition and asymptotic 
normality of subgraph count statistics, Preprint 2004-10 Faculty of Mathematics and 
Informatics, Vilnius university. 

[3] Bloznelis, M. (2003) Orthogonal decomposition of symmetric functions defined on 
random permutations, Combinatorics, Probability and Computing, Accepted for pub- 
lication. 

[4] Bloznelis, M. and Gotze, F. (2001) Orthogonal decomposition of finite population 
statistic and its applications to distributional asymptotics, Ann. Statist., 29 , 899- 
917. 

[5] Bloznelis, M. and Gotze, F. (2002) An Edgeworth expansion for symmetric finite 
population statistics, Ann. Probab., 30 , 1238-1265. 

[6] Erdos, P. and Renyi, A. (1959) On the central limit theorem for samples from a finite 
population, PublMath. Inst. Hungar. Acad. Sci. 4 , 49-61. 

[7] Helmers, R., and van Zwet, W. R. (1982) The Berry-Esseen bound for U-statistics, 
Statistical Decision Theory and Related Topics,lll. Vol. 1. (S.S. Gupta and J.O. 
Berger, eds.). Academic Press, New York, 497-512. 

[8] Janson, S. (1990) A functional limit theorem for random graphs with applications 
to subgraph count statistics Random. Struct. Algorithms, 1 , 15-37. 




Decomposition of subgraph count statistics 



79 



[9] Janson, S., Luczak, T., and Rucinski, A. (2000) Random graphs, Wiley-Interscience, 
New York. 

[10] Janson, S., Nowicki, K. (1991) The asymptotic distributions of generalized 17- 
statistics with applications to random graphs, Probab. Theory Related Fields, 90, 
341-375. 

[11] Rucihski, A. (1988) When are small subgraphs of a random graph normally dis- 
tributed? Probab. Theory Related Fields, 78, 1-10. 

[12] Stein, Ch. (1970) A bound for the error in the normal approximation to the distri- 
bution of a sum of dependent random variables Proc Sixth Berkeley Symp. Math. 
Stat. Probab., 2, 583-602. 

[13] Tikhomirov, A.N. (1976) On the rate of convergence in the central limit theorem for 
weak dependent variables, Vestn. Leningr. Univ. (Mat. Mekh. Astron.), 7, 158-159. 

[14] Tikhomirov, A.N. (2001) On the central limit theorem, Vestn. Syktyvkar. Univ., Ser. 
1, Mat. Mekh. Inform., 4, 51-76. 

[15] Zhao, L. C. and Chen, X. R. (1990) Normal approximation for finite-population 
[/-statistics, Acta Math. Appl. Sinica, 6, 263-272, 

Mindaugas Bloznelis 

Vilnius University and Institute of Mathematics and Informatics 

Mindaugas . Bloznelis@maf . vu . It 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Avalanche Polynomials of Some 
Families of Graphs 

Robert Cori, Arnaud Dartois, and Dominique Rossin 



ABSTRACT: We study the Abelian sandpile model on different families of 
graphs. We introduced the avalanche polynomial which enumerates the size of the 
avalanches triggered by the addition of a particle on a recurrent configuration. This 
polynomial is calculated for several families of graphs. In the case of the complete 
graph, the result involves some known result on Parking functions [12, 11]. 



1. Introduction 

Bak, Tang and Wiesenfeld [2] introduced 15 years ago the concept of self orga- 
nized criticality which allowed to describe a large variety of physical systems like 
earthquakes [5, 14], forest fires and even some fluctuations in the stock market [1]. 
One version of this concept is the sandpile cellular automaton model which uses a 
2 dimensional lattice; in the sites of this lattice particles are added giving rise to a 
toppling when their number in a site exceeds a given bound. A toppling on a site 
may be followed by the toppling on one or more of its neighbors and this sequence 
of topplings is called an avalanche. Many authors have studied the distribution 
of the sizes (the number of topplings performed) of the avalanches for this model 
showing that they obey to power-laws [10, 7, 13]. 

The sandpile model was also considered by combinatorialists as a game on 
a graph called the chip firing game [3, 4]. Relationships between the structure of 
the graph and the recurrent configurations of the physical model were pointed out 
[6, 9], 

Experiments on the distribution of sizes of the avalanches were considered 
only for the 2 dimensional lattice and for some classes of regular graphs. Very 
little is known for arbitrary graphs [8]. In this paper, a polynomial, encoding the 
avalanche sizes obtained by adding a particle to a site in a recurrent configuration, 
is associated to a graph. We determine this polynomial for various families of 
graphs. 

These families are the trees, the cycles, the complete graphs and the lollipop 
graphs. For these families of graphs the power law observed for the 2 dimensional 
grid is no more satisfied. The computation of the avalanche polynomial of the 
complete graph uses a bijection between recurrent configurations of this graph 
and the so-called parking functions. This computation allows the determination of 
the avalanche distributions on the lollipop graphs which shows out the existence 
of peaks also observed in some other families of regular graphs. 




82 



Robert Cori, Arnaud Dartois, and Dominique Rossin 



2. Recurrent configurations of the sandpile model 

In this section we recall the main results on the sandpile model which are useful 
in this paper. 

In what follows G = (X, E) is a connected multigraph with n + 1 vertices: 

X = {xi , X2, • • • 5 } ? 

vertex is distinguished and called the sink. A configuration of the sandpile 
model in this graph is a sequence of n integers 

U= {ui,U 2 ,...,Un)^ 

For a configuration u, the integer Ui will be considered as a number of particles 
placed on the vertex Xi. 

A configuration is stable if the integers Ui satisfy 0 < where di is 

the degree of the vertex Xi. In a configuration which is not stable a vertex Xi with 
Ui > di may perform a toppling giving a new configuration v such that Vi = Ui~ di 
and Vj = Uj where Si^j denotes the number of edges between vertices Xi and 

Xj in the multigraph G. 

We will write the toppling of vertex Xi by: 

u ^i V. 

An avalanche is a sequence of topplings; we will use the notation 

* 

u ^ V. 

for a configuration v which is reached from u after an avalanche. 

The size of the avalanche is the number of topplings performed. In the Figure 
1 below is given an avalanche of size 3, the sink is represented by a black vertex. 




In the sandpile model the main operation consists in: 

1. taking a stable configuration 

2. adding a particle on one of the vertices 

3. performing topplings until a new stable configuration is obtained. 

The above example is an illustration of this operation for the graph K 4 and the 
configuration (2,2,2). It is not difficult to prove that the stable configuration 
obtained after a sequence of topplings from a given unstable configuration does 
not depend on the order in which these topplings are performed. 

The recurrent configurations of the sandpile model are the stable configura- 
tions which are met infinitely often when performing the above operations 2 and 
3. Note that not all the stable configurations are recurrent. The recurrent config- 
urations play a key role in the sandpile model; their number is the tree number of 
the underlying graph, hence independent of the vertex chosen as the sink. 




Avalanche polynomials of some families of graphs 



83 



There are many characterizations of recurrent configurations and many struc- 
tural results on them; in the sequel we will simply use the following characteriza- 
tion. Let 7T denotes the configuration such that 7Tj is the number of edges between 
vertex xj and the sink. 

Proposition 2.1. The configuration u is recurrent if and only if it is stable and 

7T + U A U, 

where tt + u denotes the configuration v such that Vi = Ui~\- iTi for 1 <i <n. 
Moreover in this avalanche every vertex of G topples exactly once. 



3. Avalanche polynomials: some simple examples 

Let It be a recurrent configuration on the multigraph G = (X, E), we consider the 
avalanche obtained when adding a particle on site we will call it a principal 
avalanche^ we denote by adciu^Xi) the size of this principal avalanche. Note that 
a principal avalanche may be of size 0 . This is the case when the particle is added 
on a site i such that ui < di — 1. 

The three recurrent configurations of the cycle C3 are given below: 




Figure 2 . Recurrent configurations on C3. 



The sizes of the principal avalanches are 2,2 for the first recurrent configu- 
ration, 1, 0 and 0, 1 for the two other configurations. 

To any connected graph G = (X, E) with a sink, we associate a polynomial 
enumerating the sizes of the principal avalanches. This polynomial is given by: 

Avg{x) = 'Y^akX^, 

where ak is the number of principal avalanches of size k. 

For instance from the example on Figure 2 we obtain: 

Avc^ (x) = 2 + 2x + 2x^. 

Note that for any graph G, Avg(X) is equal to n times the tree number of G. 
Thus it is independent of the sink chosen for G. Note that indeed, the polynomial 
Avq is dependent of this sink. 

3.1. Avalanche polynomials of trees 

If G is a tree T then it has only one recurrent configuration, this configuration ut 
is such that un = di - 1 for each vertex x^. It is convenient to draw the tree in 
such a way that the sink is the root of the tree. 

Adding a particle on vertex Xi gives a sequence of topplings on all the vertices 
of the subtree of T rooted at x^, the avalanche ends there if Xi is a son of the root. 
If Xi is not a son of the root, then after this first sequence of topplings the father 




84 



Robert Cori, Arnaud Dartois, and Dominique Rossin 



Xj of Xi in T gets one particle and a new sequence of topplings can be performed. 
Hence we have: 



adriuT^Xi) = adriur^Xj) + ^ 2 , 



where U is the number of vertices of the subtree with root Xi . 

An example of the computation of avalanche sizes on a tree is given below; on 
each vertex is indicated the size of the avalanche obtained when adding a particle 
on it. 




Figure 3. A tree Ti and the sizes of the principal avalanches. 



The avalanche polynomial on this tree is: 



Avti(x) = + X ® + 



Note that Avg{0) = 0 if and only if G is a tree. Indeed a graph which is not 
a tree has more than one recurrent configuration and necessarily at least one of 
them has a vertex X{ with Ui < di - 1', for this vertex, adding a particle does not 
produce any toppling. 

The avalanche polynomial does not characterize the tree since the avalanche 
polynomial of the tree T 2 below is also Avt^{x) 



^this example is due to Michel Marcus. 




Avalanche polynomials of some families of graphs 



85 




Figure 4. A tree T 2 with the same avalanche polynomial as Ti. 

When we choose a different root, the avalanche polynomial changes also. 
Prom any tree or graph, we can define the set of avalanche polynomials for all 
different possible roots. This set is neither characteristic of the tree nor the graph. 
Indeed, the trees T and T' of Figure 6 built from T\ and T 2 are not isomorphic 
but admit the same set of avalanche polynomials, when the root spans the set of 
vertices. 

The set Av^ of polynomials P such that there exists a tree T satisfying 
Avt = P is the smallest set such that: 

X G Av"^ (1) 

P,Q e Av^ P + Q 6 Av^ (2) 

P € Av^ => x°'{P + 1) G Av'^, where a = P(l) + 1. (3) 




XX 2x 



Figure 5. Inductive computation of the avalanche polynomial of 
a tree. 

The proof is quite straightforward, since the two operations (2) and (3) corre- 
spond to two basic operations on trees (cf Figure 5): sticking two trees by merging 




86 



Robert Cori, Arnaud Dartois, and Dominique Rossin 



their root (+), and adding a new root only connected to the old one (</>). The result 
follows by the fact that any rooted tree can be obtained by these two operations 
starting from the two- vertices tree (whose avalanche polynomial is x). 




Figure 6. Trees T and T'. 

By the means of the two operations + and we can show that T and T' have 
the same set of avalanche polynomials. We note xq (resp. yo) the root of the tree 
T (resp. T'), and x\ (resp. X2) the root of its left sub-tree (resp. right sub-tree). 
The vertices defined in the same way on T' are called yi and 2/2- Among the other 
vertices of T, we note X3, X4, . . . , X13 (resp. X25, X26, • • • , ^35) the ones belonging 
to the first copy of Ti (resp. T2) and X14, X15, . . . , X24 (resp. X36, X37, . . . , X4e) the 
ones belonging to the second copy of Ti (resp. T2). The corresponding vertices of 
T' are called yi, where the index i is defined by the same manner (cf. Figure 6). 
Then, we can show that the avalanche polynomial of T rooted in Xi equals the 
avalanche polynomial of T' rooted in yi for all indices i. 

If Xq is chosen as the root of T, we choose yo as root of T'. Then T = 
$(Ti"hTi) + $(T2 + T2) and T' = ^>(Ti-f-T2) + $(Ti-|-T2). In term of polynomials, 
we get Avt = Avt> , since Avt^ — Avt2 • 

By symmetry of (T,T') in regards of Ti and T2, it is sufficient to show that 
Avt = Avt' when Xi (resp. yi) is chosen as root of T (resp. T'), only for indices 
i = 1 or 3 < 2 < 24. 

If xi is chosen as root of T, we choose yi as root of T'. This time, T = 
Ti 4- Ti 4- (T2 + T2) and T' = Ti + T2 4- (Ti + T2) . But, in term of polynomials, 

we still get Avt = Avt' - 




Avalanche polynomials of some families of graphs 



87 



Let choose Xi (resp. yi) as root of T (resp. T'), with 3 < i < 13. Then, it exists 
a sequence s of operations + and $ that build Ti rooted in Xi starting from {xi}. 
If we set Ava = 0 for the tree A with one vertex, then, in term of polynomials, 
the sequence s applied to 0 leads to the avalanche polynomials of T\ rooted in 
Xi. Hence we have T = s(Tl + $^(T 2 + T 2 )) and also T' = s(T2 + $^(Ti + T 2 )). 
Consequently, Avt — Avt' . If 14 < z < 24, the same argument is possible, and we 
also get the wanted result. 

Hence, T and T' are non- isomorphic but have the same set of avalanche 
polynomials. 

3.2. Avalanche polynomials of cycles 

Another simple example on which the avalanche polynomial can be computed 
easily is that of the cycle Cn+i; we have the following result: 

Proposition 3.1. There exists a principal avalanche of size k on Cn+i if and only 
if there exists two integers p, q such that pq = k and p + q < n 1. Moreover the 
number of principal avalanches of size k is equal to twice the number of couples 
(p, q) such that pq = k and p-\-q < n plus the number of couples such that pq = k 
and p + q — n + 1. 

Proof The recurrent configurations on the cycle Cn+i are the configurations in 
which all vertices have one particle except possibly one which has no particle; there 
are n + 1 such configurations. The recurrent configuration is determined by the 
distance from the sink to the vertex with no particle and to obtain the size of the 
avalanche one has to consider where the particle has been added. Let us denote 
by the recurrent configuration consisting of a full sequence of I’s and by FOl^ 
(with pPq = n — l ) those containing n — 1 I’s and one 0. 

It is easy to check that the avalanche starting with has size pq and 

the same is true for the avalanche starting with the configuration or 

with configuration F0F“^2F“^ giving the result. □ 




Figure 7. A recurrent configuration on Cg. 

Note that the number of couples (u, x^) such that m = 0 is n, hence the 
constant term of Avc^^^ is n. 

The avalanche polynomial of Cg is given by 

Avc^ = 7+2a;+4x^+4a;^+6a;^+4a;®+8a:®+2a:’’’+4a;®+2a;®+4a:^‘^+6a;^^+2a;^^+a;^® 




88 



Robert Cori, Arnaud Dartois, and Dominique Rossin 



For instance the number of avalanches of size 6 is 8 since there are 4 couples 
(p, q) with pg = 6 and p -\- q <7 namely: (1, 6), (6, 1), (2, 3), (3, 2); the number of 
avalanches of size 16 is 1 since the couple (4, 4) is the only one satisfying pq = 16 
and p-\- q = 8 and there is no p, q satisfying pq = 16 and p-\- q <7. 



4. Avalanche polynomials of the complete graph 

In the complete graph Kn-\-i the recurrent configurations are in bijection with 
parking functions. Recall that a sequence of non negative numbers {wi^W 2 , . . .,Wn) 
is an n-parking function if there exists a permutation ai, U 2 , . • . , of 1, 2, . . . , n 
such that for all i, 0 < Wi < ai (cf [12]). For instance the 16 3-parking functions 
are the permutations of the following sequences: 

( 0 , 1 , 2 ), ( 0 , 0 , 1 ), ( 0 , 1 , 1 ), ( 0 , 0 , 2 ), ( 0 , 0 , 0 ). 

A simple use of Proposition 1 gives 

Proposition 4.1. The configurations wi, U 2 , . . . , Un of Kn-\-i is recurrent, if and only 
if n — 1 — ui,n — 1 — U2, . . . ,n — 1 — Un is a parking function. Then the number of 
recurrent configurations on 'is (n + 1)^“^. 

Moreover in this correspondence the number of saturated vertices in u is equal 
to the number of 0 in the associated parking function. 



The number of avalanches of size 0 in i^n+i is equal to the number of non 
zero elements in all the n-parking functions. To compute this number we need the 
following lemma (cf [11]): 



Lemma 4.2. The number of parking functions containing k O’s is : 



n — 1 
n — k 



n 



n—k 



k 

n 




k 



Proof. Consider the set Un of all sequences u of n integers containing /c O’s and 
n — k numbers I < Ui < n, clearly there are such sequences. All parking 

functions are in Un but the converse is not true. To determine if u G f/n is a 
parking function we apply the following parking algorithm: 

Consider a park place with n slots numbered 1, 2, . . . , n lying on a circle. For 
each i such that Ui > 0 put a car on the first free slot starting from position Ui 
and going around the circle. 

The sequence u is a parking function if and only if the slot n is free at the 
end of the algorithm. 

For instance the algorithm applied to the sequence 2, 5, 0, 5, 5, 0, 0 fills the 
slots 2, 5, 6, 7, and the slots 1, 3, 4 are free; the sequence is not a 7-parking function. 

After the execution of the algorithm there are k free slots, and by symme- 
try each of the slots 1, 2, . . . , n has an equal probability of being free; hence the 
probability of the slot n to be free is giving the result. □ 

From the above Lemma we obtain; 

Proposition 4.3. The number of principal avalanches of size 0 in ATn+i is 

n(n-l)(n + l)"-2. 




Avalanche polynomials of some families of graphs 



89 



Proof. It suffices to check that the number of couples (u,Xi) such that u is an 72- 
parking function and Ui =0 is equal to 272 ( 72 +!)’^“^; the result is then obtained by 
subtracting this value from the total number of couples (u, Xi) which is n(72+l)’^“^. 
The number of the couples we are considering is by the Lemma above 



T 

n 



n— 1 / . 



k=l 



72 



n—k 



Denote /n(^) = x{n -h ^ we have: 



n — 1 



fn{x) = Yj 






k=0 



72—1 
n — k)' 



and 



— /n(l) — (^+1)^^ ^ + (72 - 1 )(t2 + 1)’^ 

giving the result 



□ 



In order to enumerate the number of principal avalanches of positive sizes we 

have: 

Proposition 4.4. The number of principal avalanches of size 772 > 0 m A"n+i 'Is: 



Proof. In order to prove this result we associate a subset and two different recurrent 
configurations to any couple (u,Xi) consisting of a recurrent configuration u and 
a saturated site Xi which gives a principal avalanche of size 772 > 0 when a particle 
is added on site Xi\ 

• a subset J of 772 — 1 sites among the n — 1 sites of A"n+i different from the 
sink and from x^: these are the sites which perform a toppling during the 
avalanche triggered by the toppling of x^, 

• a recurrent configuration on Km', consider the values Uj for Xj G J and 
subtract 772 — 2 to all these values; it is easy to check using Proposition 1 
that it is a recurrent configuration, 

• a recurrent configuration on the values Uk for xjie ^ J U {x^} 

determine a recurrent configuration on this graph. 

Conversely it is easy to build a principal avalanche of size 772 > 0 from a 
subset J of 772 — 1 sites, a vertex x^, a recurrent configuration on Km and another 

one on iifn-m+i- □ 

Below is given the sizes of avalanches for the complete graph K^. The recur- 
rent configurations which differ by a permutation of the number of particles on 
the sites are considered as equivalent. For X 5 , we represent each equivalence class 
by the configuration satisfying; < U 2 < . . . < Un, we also give the class sizes 
and the number of principal avalanches of each size. 




90 



Robert Cori, Arnaud Dartois, and Dominique Rossin 



class 


config. 


0 


1 


2 


3 


4 


24 


3, 2, 1,0 


72 


24 








12 


3, 2, 1,1 


36 


12 








12 


3, 2, 2,0 


36 


12 








12 


3, 2, 2,1 


36 


12 








4 


3, 2, 2, 2 


12 


4 








12 


3, 3, 1,0 


24 




24 






6 


3, 3, 1,1 


12 




12 






12 


3,3,2, 1 


24 








24 


6 


3, 3,2,2 


12 








12 


12 


3,3,2,0 


24 






24 




4 


3,3,3,0 


24 






12 




4 


3,3,3,1 


4 








12 


4 


3,3,3,2 


4 








12 


1 


3,3,3,3 










4 






300 


64 


36 


36 


64 




The distribution of avalanches sizes of the graph K 21 are given in Figure 8. 
0.025 

0.02 s 

s 

I 0.015 

a 

I 

0.01 

0.005 

0 

0 5 10 15 20 

avalanche size 

Figure 8. Distribution of principal avalanches on K 21 : theoreti- 
cal prediction (square) and experiments (cross). 




5. Avalanche polynomials of lollipop graphs 

In this section we consider avalanches on the lollipop graph with n + m-|-l vertices. 
It consists of a path of vertices xq,xi^X 2 , > • • ^Xm leading to the complete graph 
Kn-\-i whose vertices are x^+i, • . • , Xm-\-n- R is more convenient to consider xq 
as the sink. In other words, we apply (jf^ to 

A recurrent configuration on this graph consists of a recurrent configuration 



'^m+l 5 5 • • • 5 






Avalanche polynomials of some families of graphs 



91 




Figure 9. Lollipop graph Lm,n- 

on the complete graph, where the other vertices are saturated. This gives: 

'^m — '^m — 1 “!)•••? '^1 ~ 1* 

The sizes of the principal avalanches are given by: 

Proposition 5,1. In the lollipop graph the sizes of the principal avalanches of the 
recurrent configuration u are 

• 0, if the particle is added on a non saturated vertex for u, 

• i{m + n) — if the particle is added to the i-th vertex of the path 

(i <m) starting from the sink, 

• l]adxn+i (^5 particle is added in a saturated vertex 

Xi for u. 

As an example the avalanche polynomial of the lollipop graph is : 

24 -h 16x^ -h + 16x^^ + 16x^2 + + 6x^® + 9x®^ 

A computation of the sizes of principal avalanches for the lollipop graph 
Tio ,20 gives the distribution shown in the Figure below: 

Let us define the operator Tr^ for every polynomial P = Y17=o ^ follows: 

min{k,n) 

Tr*(P)(x) = ^ aix\ 

i=0 

Corollary 1. The avalanche polynomial of the lollipop graph Lm,n 'is: 

{Av L^^J{x) , 

where a = ^Yid is the line with fc + 1 vertices. 

Proof. The first term of the sum is obvious by the precedent proposition. The 
second term corresponds to principal avalanches for couples (ix, x^), where i < m, 
i.e., where Xi is a vertex of the path The size of this principal avalanche is the 
same if the vertices Xj for j > i are all in line. Indeed, what matters when a particle 
is added on Xi is how many vertices are in the subgraph when deleting the path 




92 



Robert Cori, Arnaud Dartois, and Dominique Rossin 



0.04 
0.035 
0.03 

1 0.025 

E. 

2 0.02 
Cl. 

0.015 
0.01 
0.005 
0 

0 1000 2000 3000 4000 5000 

avalanche size 




Figure 10. Avalanche distribution for Lio, 20 - expected result 
(dashed line), and experiments over 10^ computations (cross). 



xo, xi, . . . , For i < m, the avalanche size is at most a, thus 

gives the good exponents. Since there are (n + 1)^“^ recurrent configurations, we 

get the result. □ 

In fact, this result could be generalized. Let G = (X^E) be a rooted graph. 
We call dissipation of a principal avalanche the number d of particles that the sink 
(root) receives during the avalanche. We associate a new polynomial enumerating 
the principal avalanches by size and dissipation to any graph G; this polynomial 
is given by: 

AvG(x,y) = ^ak,dx''y‘^, 

where is the number of principal avalanches of size k and dissipation d. 

If G is a tree, Avg(x, y) has a very simple expression: Avg{x, y) = AvG{x)y. 
Indeed, every principal avalanche has dissipation 1. 

If G is such that every vertex is connected to the sink, like K^+i for example, 

Avg{x^ y) has also a very simple expression: Avq{x^ y) = AvG{xy). Every principal 
avalanche of size m admits m as dissipation, since every toppling gives a particle 
to the sink. 

Then, if we apply (j)'^ to G like we did with fo obtain a lollipop graph, 

we get a graph G^^, whose avalanche polynomial is: 

= Avg{x,x°') + 

where a = — and \G\ = n + 1. In fact we have: 

AvGrn(x,y) = AvG{x,x°-y) + ^^^^^TT‘^{AvL^_^J{x)y. 

For the lollipop graph, it is particularly simple, since AvK^^i(x^y) = AvKrr,+i{xy). 





Avalanche polynomials of some families of graphs 



93 



6. Concluding remarks 

In this paper, we have considered avalanche polynomials which encode the distri- 
bution of the sizes of avalanches. We have seen that for the complete graph, we 
have a huge peak for size 0. Now the tree defined on Figure 11 gives a peak for 
size n. 



Q 




u-2 



Figure 11. Tree T^. 

The construction consisting of merging the sinks of two rooted graphs trans- 
lates in adding the avalanche polynomials. Applying this construction to the two 
kind of graphs considered above, we are able to build graphs having any shape of 
distribution in a certain sense. 

This informal remark can be formalized in a more precise result which will 
be developped in a future work. 



Acknowledgement 

We thank Michel Marcus for his example. 



References 

[1] P. Bak. How nature works - the science of soc. Oxford university press, 1997. 

[2] P. Bak, C. Tang, and K. Wiesenfeld. Self-organized criticality; an explanation of 1// 
noise. Phys. Rev. Lett., 59, 1987. 

[3] N. L. Biggs. Chip-firing on distance-regular graphs. Tech. Report LSE-CDAM-96-11, 
CDAM Research Report Series, June 1996. 

[4] N. L. Biggs. Chip-firing and the critical group of a graph. Journal of Algebraic 
Combinatorics, 9(l):25-45, 1999. 

[5] K. Chen, P. Bak, and S. P. Obukhov. Self-organized criticality in a crack propagation 
model of earthquakes. Physical Review Letters A, 43:625-30, 1991. 

[6] R Cori and D Rossin. On the sandpile group of dual graphs. European J. Combin., 
21(4):447-459, 2000. 

[7] M. Creutz. Abelian sandpile. Computers in Physics, 5:198-203, 1991. 

[8] A. Dartois and D. Rossin. Analysis of the distribution of the length of avalanches on 
the sandpile group of the (n, fc)-wheel. FPSAC, 2003. 

[9] D. Dhar. Self-organized critical state of sandpile automaton models. Physical Review 
Letters, 64:1613-1616, 1990. 

[10] D. Dhar, P. Ruelle, S. Sen, and D. Verma. Algebraic aspects of Abelian sandpile 
models. Journal of Physics A, 28:805-831, 1995. 




94 



Robert Cori, Arnaud Dartois, and Dominique Rossin 



[11] D. Foata and J. Riordan. Mappings of acyclic and parking functions. Aequationes 
Math., 10:10-22, 1974. 

[12] A. G. Konheim and B. Weiss. An occupancy discipline and applications. SIAM J. 
AppL Math., 1966. 

[13] D. V. Ktitarev, S. Liibeck, P. Grassberger, and V. B. Priezzhev. Scaling of waves in 
the Bak-Tang-Wiesenfeld sandpile model. Physical Review E, 61:81-92, 2000. 

[14] A. Sornette and D. Sornette. Self-organized criticality of earthquakes. Europhys. 
Lett, 9:197-202, 1989. 

Robert Cori 

LABRI, Universite Bordeaux 1 
cori@labri.u-bordeaux.fr 

Arnaud Dartois 

LIX, Ecole Polytechnique 
dartois@lix.polytechnique.fr 

Dominique Rossin 

CNRS and LIAFA, Universite Paris 7 
rossin@liafa.jussieu.fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Perfect Matchings in Random Graphs with 
Prescribed Minimal Degree 

Alan Frieze* and Boris Pittel^ 



ABSTRACT: We consider the existence of perfect matchings in random 

graphs with n vertices (or n -h n vertices in the bipartite case) and m random 
edges, subject to a lower bound on minimum vertex degree. A random bipartite 
graph without isolated vertices and m > n edges with high probability (whp) 
has a perfect matching iff the average vertex degree is 0.5 log n + log log n + Cn, 
Cn — > oo however slow. A random graph with minimum degree at least two whp 
has a matching that matches all the vertices except ^^odd-man-out” vertices, one 
per each isolated cycle of odd length, and one for the remaining vertex set if its 
cardinality is odd. So, for n even, whp the random graph has a perfect matching 
iff it does not have isolated odd cycles. 



1. Introduction 



To quote from Lovasz [17], “the problem of the existence of 1-factors (perfect 
matchings), the solution of which (the Konig-Hall theorem for bipartite graphs 
and Tutte’s theorem for the general case) is an outstanding result making this 
probably the most developed field of graph theory”. Erdos and Renyi ([8], [9]) 
found a way to use these results for a surprisingly sharp study of existence of 
perfect matchings in random graphs. For Bn^mj a random bipartite graph with 
n-\-n vertices and m = n(lnn + c^) random edges, they proved [8] that 



lim Pr (Bn m has a perfect matching) 

n-^oo ’ 



lim Pr{d{Bn^rn) > 1) 

n-^oo 




where S denotes minimum degree. Of course, minimum degree at least one is a 
trivial necessary condition for the existence of a perfect matching. The Hall theo- 
rem turned out to be perfectly tailored for use in combination with probabilistic 
techniques, pioneered in [8] several years earlier. Even though Tutte’s theorem for 
the non-bipartite case is considerably more involved, in [9] Erdos and Renyi man- 
aged to extend the analysis to the random graph Gn,m) a random general graph 



* Supported in part by NSF grant ccr-0200945. 
"^Supported in part by an NSF grant 




96 



Alan Frieze and Boris Pittel 



with n vertices and m= |(lnn + c^) edges, showing that 

lim Fr(Gnm has a perfect matching) = lim Fr{S{Gn,m) > 1) 

n-^oo ’ n— ^oo 

n even 

0 Cn^ -OO, 

e~^ "" Cn ^ C, 

1 Cn OO. 

In both cases a perfect matching becomes likely as soon as one has sufficiently 
many random edges for the minimum degree to be at least one with high probabil- 
ity (whp). This has led researchers to consider the existence of perfect matchings 
in models of a random graph in which the minimum degree requirement is al- 
ways satisfied. Perhaps the first result along these lines is due to Walkup [23]. He 
considered a «;-out model B^-out of a random bipartite graph, again with n + n 
vertices Vi + 1^2- Here each vertex v E Vi “chooses” k> random neighbours in its 
complementary class V^-i. Walkup showed that 

lim Pr(B^_out has a perfect matching) = 

Frieze [10] proved a non-bipartite version of this result, the argument being based 
on Tutte’s theorem and considerably harder. Very recently Karohski and Pittel 
[13] have proven whp the existence of a perfect matching in what they called the 
B(i+e-i)_out graph, a subgraph of B 2 -out^ obtained from Bi_out by letting each of 
its degree 1 vertices select another random neighbor in the complementary class. 
Observe that in all of these results [23], [10] and [13] the number of random edges 
depends linearly on the number of vertices, and the minimum degree has been 
raised to 2, in a sharp contrast with the case m being of order nlogn. Here is why. 
When there are order nlnn random edges, there are few vertices of degree 1 and 
they are far apart. In sparser models, with minimum degree 1, whp there will be 
a linear (in n) number of vertices of degree 1, and some two vertices of degree 1 
will have a common neighbor, which rules out a perfect matching. In the case of 
random regular graphs it turns out that minimum degree 3 is required, Bollobas 
[2]: Let Gr denote a random r-regular graph on vertex set [n], n even. Then 

{ 0 r — 2 

-1 o 

1 r == 1 or r > 3. 

The case r = 1 is trivial since then Gr is itself a perfect matching of [n]. G 2 is whp 
a collection of O(lnn) disjoint cycles and they will all have to be even for G 2 to 
have a perfect matching. The meat of the result is therefore in the case r > 3 and 
this follows from r-connectivity and Tutte’s theorem. 

Another approach was considered by Bollobas and Frieze [5]. Let Sn^ denote 
the set of graphs with vertex set [n], m edges and minimum degree at least k. Let 
be sampled uniformly from By conditioning on minimum degree 1, 

say, we will need 50% fewer random edges to get a perfect matching whp: Let 
m = ^(Inn H- 21nlnn -f Cn). 

{ 0 Cn ^ —OO sufficiently slowly, 

"" Cn ^ c, 

1 Cn ^ OO. 



fO K=l 
n>2 




( 1 ) 




Perfect matchings in random graphs 



97 



The restriction “sufficiently slowly” may seem out of place, but bear in mind that if 
n is even and m = n/ 2 then the probability of a perfect matching is 1. The precise 
threshold between n/2 and -nlnn for the non-existence of a perfect matching 
was not determined. Using the approach developed in the present paper for the 
bipartite case, we have found that “sufficiently slowly” in (1) can be replaced 
simply by “and m > n/2” . (For m = n/2 -f 1, say, the likely graph, with minimum 
degree 1 at least, consists of n/2 — 3 isolated edges, and two paths, each consisting 
of 3 vertices.) The study in [5] was extended in Bollobas, Fenner and Frieze [3] 
who considered the probability that has [/^/2J disjoint Hamilton cycles plus 
a further disjoint perfect matching if n is odd. 

In the present paper we continue this line of research. We first consider the 
bipartite version of (1). Let denote the set of bipartite graphs with vertex set 
[n] -h [n], m edges and minimum degree at least k. Let be sampled uniformly 
from 

Theorem 1. Let m = |(ln n -h 2 In In n -h . Then 

{ 0 Cn — 00 , m> 

e~i^ "" Cn ^ c, ( 2 ) 

1 Cn ^ 00. 



(As in the case of we observe that the threshold for m is reduced by 
the factor of 2, compared to that of the random graph The RHS expression 

in (2) is the limiting probability that no two vertices of degree 1 have a common 
neighbor. Thus, the probability that a perfect matching exists is (close to) 1 when 
either m = n/2 or Cn is large, and the probability is very small for m everywhere 
in between, except not far to the left from 0. 

The next natural question is: How many random edges are needed if we 
constrain the minimum degree to be at least 2, so ruling out the possibility of two 
vertices of degree 1 having a common neighbour? 

In this paper we only consider the non-bipart ite graphs. To cover both even 
and odd values of the number of vertices, it is convenient, and natural, to say that 
a graph G = (U, E) has a perfect matching if /x*(G) = LI^I/^J^ where /i*(G) is the 
maximum matching number of G. 

Unlike the bipartite case, with a positive limiting probability the “sparse” 
graph G^^^ may have (short) isolated odd cycles. This observation rules out a 
“whp-type” result for probability of a perfect matching. Let X{G) stand for the 
total number of odd isolated cycles in G. Clearly 

2 J 

Let X„ = X{GifJ and = KGi%). 

Theorem 2. Let liminf c > 1. Then 




M*(G) < u{G) := 



lim Pr(^; = Un) = 1, 



and Xn is, in the limit, Poisson (A), 
A = Afi 



1 , 1 + cr 

-log- 

4 1 — (T 



a 

2 ’ 



<7 := 



P 

eP -V 




98 



Alan Frieze and Boris Pittel 



and p satisfies 



In particular, 

lim has a perfect matching) = { ^ \ ,, (3) 

n^oo ^ ^ I e”^-hAe"^, if n odd. ^ ^ 

Thus the subgraph obtained by deletion of isolated odd cycles whp has a 
perfect matching The RHS in (3) is the limiting probability that the total number 
of isolated odd cycles is 0 (n even), or 1 (n odd). Notice that c = 1 corresponds to 
the random 2-regular (non-bipartite) graph, which typically has 0(logn) isolated 
cycles, both odd and even. Sure enough, the explicit term in the RHS of (3) 
approaches zero as c | 1, since A — ^ oo. 

Theorem 2 does leave open the case where the number of edges m = 2-\- o{n) 
and so it is not quite as tight as Theorem 1. 

Here is an interesting application of Theorem 2. Consider the Erdos-Renyi 
random graph G(n, m), m = cn, for lim inf c > 1/2, i.e. the supercritical phase. 
By consecutive deletion of the vertices of degree 1 at most, we obtain a 2-core, 
the largest subgraph of G(n, m) with minimum degree 2 at most. Let i/, pi stand 
for the number of vertices and the number of edges in the 2-core. Conditioned on 
iy,pi, and the vertex set, the 2-core of G[n,m) is distributed as Since whp 

pi,]/ are of order n, and pi /i/ is bounded away from 1, we see that whp the 2-core 
of the giant component of G{n,m) has a perfect matching. 

Among other things, the proof of Theorem 2 is based on an asymptotic analy- 
sis [1] of a matching algorithm initially discovered and studied by Karp and Sipser 
[14]. A related analysis for the bipartite graph is considerably more than a 

technical extension of that in [1], basically because of some serious complications 
due to bipartiteness. It is shown in [11] that has a perfect matching whpwhen 
m = cn, c> 2 constant. 

To conclude our discussion, for integer A: > 1, let graph G have property Ak 
if G contains [k/2\ edge disjoint Hamilton cycles, and, if k is even, a further edge 
disjoint matching of size [n/2j. Bollobas, Cooper, Fenner and Frieze [4] show that 
for fc > 2, there exists a constant Ck < 2(fc-|-2)^ such that if c > Ck, 
has property Ak- Thus the current paper deals with the property Ai and proves a 
sharp result. It is reasonable to conjecture that the true value for Ck is {k + l)/2. 
Note that if c = (A:-hl)/2 and cn is integer then is a random (A;-hl)-regular 

graph and this is known to have the property Ak whp, Robinson and Wormald 
[22], Kim and Wormald [15]. 



pje'' - 1 ) 

- 1 - p 



- 2c. 



2. Enumerating some bipartite graphs. 

In our probabilistic model, the sample space is the set of all bipartite graphs 
on the bipartition [n] -h [n] with m edges, and the minimum degree at least k. 
The probability measure is uniform, i.e. each sample graph is assigned the 

same probability, Nk{n,m)~^, where Nk{n,m) = We will obtain a sharp 

asymptotic formula for Nk{n,m), as a special case for the number of bipartite 
graphs meeting more general conditions on vertex degrees. 




Perfect matchings in random graphs 



99 



Let the i/i-tuple c = (ci, . . . , and the 1 / 2 -tuple d = of non- 
negative integers be given. Introduce ^ = { 1 ^ 1 , 1 ^ 2 ), the total number of 

bipartite graphs with /i edges, such that ai > Ci, {i G [z/i]), and bj > dj^ {j G [ 1 ^ 2 ])* 
Of course, iVc,d(*^ 5 /^) = 0 if fi < or fj, < Yhj^j' assume that 

Define 

Go{x) = n 

^€[ 1 ^ 1 ] 

H^{y) = n (5) 

where 

(«) 

e>t e<t 

The following estimates will be proved in Appendix A, along with some other 
lengthy computations. 



Lemma 2.1. Suppose that z>'i,r' 2 ,/i ^ 00 are such that that V\^V 2 = 0{p) and 
pi = 0{vi\ogVi), i = 1,2. Let Gc{x) and Hd{y) be defined by (4) and (5). 

(i): Suppose that fi~^ <h ri,r 2 — 0(log/i). Then 

Mn)Hd{T2) 



{rir2Y 



^-iEA(Y)EA(Z)pj.^^ = p)Vt{S = /i) -h 0(e' 



l^(log^ jLi) 



{nr2)i^^/vii'2r\r2 



) 

(7) 

( 8 ) 



the last estimate holding without the condition i'i^i' 2 ^P' 00 , where Yi = 

Po{ri \ > Ci), Zj ~ Po{r 2 \ > dj) are all independent, and R = S = 

.. ' 

(ii): Suppose also that max^ Ci = 0{1), maxj dj = 0(1), and fa > max{^^ q, 
Y2j dj}. Then there exist (unique) positive roots pi,p 2 of (100) and (101), 
and 

~ ^ ,g, 

\PlP 2 r 

where Yi = Po{pi-, >Ci),Zj = Po{p2', >Cj). 

Furthermore 

1 



Pt{R = p) 



(27rEiVar(y,))i/2’ 



or Pr(jR = p) rsj 



CTl 



dependent upon whether ai := P~^i Ci approaches infinity or stays bounded, with 
the analogous formula for Fr{S = p). 

Corollary 2.2. Suppose n = 0{m), kn <m = O(nlogn). Then 



( fkjpTPr {Y,jYi = m) 



exp - 



n" 

21 m? 



E"[(E)2 



(10) 



Nk{n^ m) ~ m! 




100 



Alan Frieze and Boris Pittel 



where Y is Poisson (p; > k) such that Ey = r = mjn. Note that 



E[(y)2] = { pr. 



A: = 0, 

pr, fc = 1, . 

pr/(l — e“^), k = 2. 



Further 



Pr 



(27rnVary)' 



if m - kn ^ 00 , 






if m — kn> 0 is fixed. 



As we will see, these results are all we need to evaluate (bound) the probabil- 
ities arising in the proofs of Theorem 1 and Theorem 2. We will also need a crude 
upper bound for the fraction of bipartite graphs in question, with the maximum 
degree exceeding m^. This bound is already implicit in the preceding analysis! 
Indeed, from (90), (92), (93), and the observation that the factor 



Pr^(^ Yi = m) exp 



jE^iiYh 



in (10) is exp(— 0(log^ n)), it follows that, for a' < a < 1/3, this fraction is 6“^"" 
at most. 

One is tempted to call this “overpowering both the conditioning and the fudge 
factor” . Needless to say, this trick would work for the counts (fractions) of other 
graph classes, as long as the degrees restrictions are so severe that the probability 
that Yi, Zj meet them is negligible compared to the factor in (14). 



3. Proof of Theorem 1 

We will use Hall’s necessary and sufficient condition for the existence of a perfect 
matching in a bipartite graph to prove (1). 

The random graph has no perfect matching iff for some k > 2 there 

exists a k- witness. A fe- witness is a pair of sets KCR, LCC, or KCC,LCR, 
such that \K\ = k,\L\ = k — 1 and N{K) C L. Here N{K) denotes the set of 
neighbours of vertices in K. A fc-witness is minimal if there does not exist K' C 
K,L' C L such that {K',L') is a fc'-witness, where fc' < fe. It is straightforward 
that if {K, L) is a minimal fc- witness then every member of L has degree at least 
two in B{K U L), the subgraph of induced hy K U L. Therefore B[K U L) 
has at least 2(fc — 1) edges. We can restrit^t our attention to fc < n/2 since for 
k > n/2 we can consider C = C\L,R = R\ K. For 2 < k < n/2, let Wn,k,^l 
denote the random number of minimal fc- witnesses, such that B{K L) has p 
edges, (1 > 2{k — 1). Actually, since k < n/2, we also have n < m — n. 

(i) Suppose m = 0{nlogn) and m > (1/3 + e)nlogn, e > 0. Let us prove 
that whp Bf^ has no /c- witnesses with k >3, i.e. 

Pr I = 0 I ^ 1, n-^oo. 

\fc>3,/u>2(fc-l) / 




Perfect matchings in random graphs 



101 



It suffices to show that 

En,k,ii 0, En^k.ix •= (15) 

fc>3,/Li>2(A:-l) 



Let us bound En^k,^i‘ For certainty, suppose that K C L C C. We can choose 
a pair {K,L) in ways. {K,L) being a witness imposes the above listed 

conditions on degrees of the subgraph induced by K U L. If we delete the row set 
X, we get a remainder graph, which is a bipartite graph with bipartition (/?', (7), 
R' = R \ K] it has m — fi edges and every vertex in i?' U (C \ L) has degree 1 
at least. We bound Wi, the total number of those subgraphs, and N 2 the total 
number of the remainder graphs usings Lemma 2.1 (i), emphasizing the possibility 
to choose the corresponding parameters ri,r 2 anyway we want. The product of 
these two bounds divided by the asymptotic expression for N\ (n, m) in Corollary 
2.2 provides an upper bound for the probability that {K, L) is a fc-witness with 
/i edges. Multiplying this bound by we get a bound for En^k,^- To 

implement this program, we consider separately k < and k > m^, where 
j3 G (0, 1) will be specified in the course of the argument. 

Let k < m^. Pick a' < a = {1 — (3)/2. From the note following Corollary 2.2, 

with probability 1 — at least, the maximum vertex degree in the uniformly 

random bipartite graph is at most. So, backpedaling a bit, we will consider 
/i < m^, (7 := (1 + /5)/2), only. To bound Ni we use (85) with ri = lJi/k^r 2 = 
ji/{k — 1), and to bound N 2 we use ( 8 ) with vi = V 2 = p- Here p is the parameter 
of Yi in Corollary 2 . 2 , the root of xfo{x)/ fi(x) = r, r := m/n, so that 



p = r{l-e ^)<r, p = r — Q{re ^). 



(16) 



The Vi for Ni seem natural, if one interprets them as parameters of Poissons ap- 
proximating the vertex degrees that should add to p on either side of the subgraph 
induced by iC U L. Since A;, p are relatively small, vi = V 2 — p should be expected 
to deliver a good enough bound for A^ 2 - Most importantly, this choice does the 
job! 

After cancellations and trivial tinkering, the resulting bound is 



En,k,iJ, ^6 
X 

X 



. o, (kV fk-iy 
Vm/ V M / 

exn ( 1 (n-k)E(Yh . (fc- l)p"+(»-fc+l)E(y )2 

^ 2 m—ii m—ji 



(17) 



Some explanation: fc — 1 vertices from L in the remaining graph have degrees not 
bounded away from zero, whence the factor /o(p)^“^ = in the second line, 

and fc — 1 usual Poissons (p), each with the second factorial moment equal p^, 
contributing (fc — l)p^ in the last line. Also, we have used /i(p/(fc — 1))^“^ where 
we could have used the smaller f 2 {p/{k — 




102 



Alan Frieze and Boris Pittel 



The last line fraction is of order 0(1), as E(y )2 = O(p^). Further, since 
log/i( 2 :) = log(e^ — 1) is concave. 



klogfi +{k- l)log/i (^-^) - (2^- l)log/i(p) < 

{2k - 1) (logfi - log/i(^)^ < {2k - l)(log/i)'(p) (^ 2 ^^ - 

<2n- {2k - l)p + ‘ipe-P. (18) 

Using the last observations and ji\ = 0(/i^^^(///e)^), we see that En,k,fi is of order 
E* ^ ^ at most, where 



e: 



^2k Ig kp 

^ fc!(fc-l)! ' m\p\ 
First, since 2(fc — 1) < ^ < 



• exp(3/ie ^). (19) 



fp* 

-^n,k,p-\-l 



kPp^ 



= iog2 






so that 



Second 

E: 



n,A:+l,2A: 



E' 



n,A:,2(fc— 1) 



^ > ^n,k,p ^ ^n,k,2{k—l)' 

2{k — l)<p<m'y 



rP {k + p'^ e ^ 

k{k + l){2k - l)2k{m -2k + l)(m -2k + 2)fc4(fc-i) 
<b = 0{p^e~P). 



Therefore 



E £;«(»-!) A 









3<k<m'y 

as m > (1/3 + 6)nlogn. In summary, 

= 0(n-3^). 



(20) 



3<k<m'^ 

2{k—l)<p<m'^ 



Consider now k > . This time we use (7) not only for Ni but for N 2 

as well, using for the latter ri = {pn — p)/{n — k) and V 2 = {pn — p)ln. That 
the latter are positive follows from p < m — n/2 and (16). (For /i, k not being 
relatively small anymore, the count N 2 of the remaining graphs would hardly be 
well bounded via the previous choice Vi = V 2 = p ^ m/n. What we have chosen 
turns out to be a working compromise between that old choice and the “naive” 




Perfect matchings in random graphs 



103 



ri = {m — /d)/{n — k), V 2 = (m — /i)/n.) The resulting bound is 









= P 



Of 

C) 



p) 



n-k 
pn — ^ 



m—/j, 



n 



pn — li 



m—fi 



X - ^ 



flip) 



2n 



(We use the notation an <p bn to indicate that anfbn is polynomially large, at 
most.) Using again convexity of log/i( 2 :) and denoting h = {k — l){pn — p)/n^ we 
obtain that the logarithm of the last line fraction is less than 



2nlog/i (p-^ - 2nlog/i(p) + /i < -h{{log fi)' (p) - 1) = 



Thus the fraction is bounded, 1 at most, like its counterpart for k < m^. (Our 
search for the proper ri,r 2 was driven, in fact, by desire to make that fraction 
bounded again!) 

Introduce x = k/n, y = p/n. Using the Stirling formula for factorials, we 
obtain easily then that 

K,k,ix <p ^w{nH{x,y)), 

where 



H{u^v) = 2rlogp-h 2H{u) - rH{v/r) + 2v\ogu/v + (r 



v)log 



1 — u 

ip^' 



{u G (a;„,l/2], v G (0,/9)), rr„ := and 



H{w) = wlogl/w + (1 - u;)logl/(l — w). 



It follows that 



Hy{u,v) 

Hyy (tt, 



v(r — v)[l — u) p — v 

_1 2 1 _ 2{p-r) 

r — V p — v V {p — vy 



So Hy{u^v) decreases with v, and Hy{u,Q+) = oo, Hy{u^p—) = — oo. Hence, given 
tx, H(u^ v) attains its maximum at a unique root v{u) of the equation 



[r — v)[l-u)v \ p — v ) 



(22) 



By (16), p <T and p — r — 0{re ^) — > 0; so we should expect v{u) to be close to 
v*{u), the root of (22) with p replaced by r, i.e. 



v*(u) 



u 



2 



1 — li + 




(li < 1/2). Careful computations reveal that 



H*{u,v*{u)) = rlog(l - li + li^) 4- 2H{u), 




104 



Alan Frieze and Boris Pittel 



where H*{u^v) is obtained from H{u,v) by replacing p with r. Furthermore, as 
the RHS of (22) is 1 + 0(e“^), it can be shown that 

v{u) = + 0(e~^)). 

In this setting, strictly speaking, v{u) is also a function of p, and so is H(u^ 
both explicitly and implicitly, via v{u). Since Hy{u^v{u)) = 0, the derivative of 
H{u^v(u)) with respect to p is just the partial derivative, which is 

^ _ 2{r-v{u)) ^ 2v{u){p-r) ^ rx. 

P p-v{u) p{p-v{u)) 

therefore 

H{u,v{u)) = H''{u,v*{u)) + 0{e~'^{r - p)) = H*{u,v*{u)) + 0(re“^^). 
Using 

log(l — u-\- u^) < -u/2, {u < 1/2), (1 — u) log(l — u)~^ < u, 

we see that, for u € {m^ /n, 1/2] (and m > (1/3 + 6)nlogn), 

H*{u,v*{u)) < — n(r/2 — 2 — 21og(l/u)) 

< -u((l/6 + e/2) logn - 2(1 — /3) logn + O(loglogn)) 

< — culogn, 

where c = c{(3) > 0 if ^ > 11/12 — e/4. So, for this choice of (3, 

H{u, v{u)) = H*{u, '^*(^)) + 0{r~‘^^) 

< -cm^n'^ logn + logn) 

< 

This inequality shows that 

<P exp(-m^) ^ En^k,iJ. < exp(-0.5m^), 
as the fudge factor in (21) is only exp(0(log^ n)). Consequently 

< exp(-0.4m^). (23) 

vnP <k<nl2 
fj.<m 

Combining (20) and (23) we obtain 

^ ^ ^n,k,ijL H“ ^ ^ ^n,k,fi ~ 0 (tI ), 

3<k<m^ vn}l^<k<nl2 

2(fc— 1) fjb^m 

SO that 

Pm E E = 0 ) = 1 - 0(n-3^). (24) 

\fe>3|U>2(fc-l) / 

(ii) Turn now to the 2-witnesses. Prom (19), it follows that 

En,2,i <b = 0(nlog2ne-2-/") = 0(e-="), (25) 

with Cn defined by the notation 

71 

m = -(logn-|-21oglogn-|-c„). 




Perfect matchings in random graphs 



105 



Case 1 Cn 00 . 

Assume first that m < 2nlogn. Then (25) shows that, with probability more than 
1 - 1, there are no 2-witnesses. By (25), with probability 1 - 0(n“^/^) at 

least, there are no 3- witnesses either. Thus, with probability approaching 1, there 
exists a perfect matching. 

If m > 2nlogn then whp 5{Bn^rn) > 1 and Bn^m has a perfect matching. The 
result in this case follows immediately. 

Case 2: Cn — > c E (— oo, oo). 

We want to prove that Wn, 2 ,i, the number of 2- witnesses, is, in the limit, Pois- 
son (e“^/4). We do so via the factorial moments method. To evaluate E(Wn, 2 ,i) 
sharply, we notice that in this case Ni — 2 ( 2)^1 exactly, and for N 2 we use the 
part (ii) of Lemma 1 with n = V 2 = p. So (compare to (17) 

~ ^ A := (26) 

4 4 

We need to show that, for each t>2 

lim E(W„, 2 .i)t = A‘, X = ]e-^. 

n— >oo 4 

To simplify our task, let us consider instead W* 2 1 ? the total number of vertex- 
disjoint 2-witnesses. The difference Wn, 2 ,i - W^n,2,i ^hp (24) at most, where 
Wn is the total number of subset pairs (A, L), A C i?, 1/ C (7, or A C C, L C i2, 
such that \K\ = 3, \L\ = 1, and L = N{K). Analogously to (17 ), 

EW„ <(, = 0(n-i/2). 



Therefore, Wn, 2 ,i = ^n, 2 ,i ^hh probability 1 — 0{n ^/^) at least, and it suffices 
to show that 

hm E(W; 2 ,l )2 = A^ t>l. (27) 

This is obviously true for t = 1. Let t >2. Combinatorially, {W* 2 ,i)t is the total 
number of ordered t- tuples of (vertex-disjoint) 2- wit nesses. Given r -h s = t, let us 
compute Ers^ the expected number of t-tuples containing r “2 rows, 1 column” 
(first kind) witnesses, and s “2 columns, 1 row” (second kind) witness. The r 
vertex-disjoint first kind of witnesses can be chosen in (;^)(^)(2r — l)!!r! ways. 
(Indeed, once 2r rows and r columns are selected, we pair the rows in (2r — 1)!! 
ways and assign the formed r pairs to r columns in r! ways.) Given any such 
choice, the s 2-nd witnesses, disjoint among themselves and from the r first kind 
witnesses, can be chosen in (^2^0 (^1^0 ~ l)!!^! ways. There are t\ — {r s)! 
ways to order all r -h 5 witnesses. Hence Ni{r, s), the total number of the ordered 
t- tuples of the “alleged” witnesses, is given by 



^ \ ^ \ .X.. jn-r 



(2s — l)!!s!(r -t- s)! 



Deleting 2r rows and 2s columns involved in first kind and second kind witnesses 
respectively produces a bipartite graph with m — 2t edges that meets the following 
conditions, (a) Every row (column) vertex not involved in the s 2-nd (in the r 




106 



Alan Frieze and Boris Pittel 



first) kind witnesses has degree at least 1 . (b) No edge can be added to one of 
(just deleted) r + s 2-witnesses to form a pair (K,L)^ such that \K\ = 3, \L\ = 1, 
X C i?, L C C, or AT C C, L C i?, and N{K) = L. (This condition is necessary and 
sufficient for the (r + s) 2- witnesses to be disjoint from all other 2- witnesses.) De- 
note the total number of such graphs by N 2 (r, s). Clearly iV 2 (r, 5 ) < N 2 (r, s), where 
K 2 (r, s) is the total number of bipartite graphs with the condition (b) dropped. 
Using (7) with n = r 2 = p, we have 



^ 2 (r, s) 



(m — 2t) 



j {eP - l)2n-3tgt^ 



EX(Y)EA(Z) 



p2(m-2t) 

■Pr{R = m — 2t)Pr{S — m — 2t) 0(e 



— log^ m 



))■ 



Here R = Zi=i S = = Po(p; > 1) for 1 < i < n-2r~ s, 1 < 

j < n— r — 2s, and Zj = Po(p) for n — 2r — s < i < n— 2r, n—r — 2s < j < n— 2s. 

Using (A.l) for both local probabilities, we obtain that the second line in the above 
formula is asymptotic to 

(nE(Fi) 2 )^' 



exp 



2m? 



1 

27 mVar(Fi) ' 



Thus 



Now 



7Vi(rz,m) ^ ^ ^ 



^ 2 (r, s) — N 2 {r, s) < r{n — 2r — s)N 2 ^\r, s) + s{n — r — 2 s)N 2 ^\r, s); 






(28) 



r(2)/ 



here N 2 ^\r, s) (AT^^^(r, s) resp.) is the total number of the remaining graphs, such 
that a particular row (column resp.) vertex is incident to a single column (row 
resp.) vertex, which happens to be one of the vertices from r first kind (s second 

kind resp.) witnesses. Consider A^^^(r, s). Deleting that row we get a graph with 
one less number of row vertices and one less number of edges. So, using (7) with 
n = T 2 = p and — 1 ^ e^, we obtain that 



N^'\r,s) 
A/^i (n,m) 

^2^\r, s) 
Ni{n,m) 



<b 

<6 



y^2{r,s) 

Ni{n^m) mep' 

y^2{r,s) 

Ni{n^m) mep' 



Therefore 



N 2 (r,s) - N 2 {r,s) np‘ 



A'i(n,m) 

Collecting the pieces, we obtain that 



<b 



meP 



< pe 



0 . 



\r) 2^ 



p^ y 

m^e^p J 





i.e. 



t 



r=0 





= \K 




Perfect matchings in random graphs 



107 



Thus W *2 1 is in the limit Poisson (A), and then so is Wn, 2 ,i- Consequently, a 
perfect matching exists with the limiting probability equal 

lim Pr(VPn, 2 ,i = 0) = e~^ = exp 




Case 3: Cn — > -oo, m > n. 

3a; m > (| - e)nlogn. 

In this case, after with trivial modifications in the above derivation, 

E(K.2,i)2 ~ (^) -OO, 

and, by Chebyshev’s inequality, 

lim Pr(W^; 2 ,i > 0) - 1. 

n— +00 ’ ’ 



So, whp, a perfect matching does not exist. 

3b: m < {1/3 - e)nlogn, m — n oo. 

Note that np ^ oo. Let denote the total number of isolated trees with 2 row 
vertices and 1 column vertex. > 0 implies that there is no perfect matching.) If 
the Xn trees are deleted, the remaining graph has n — 2t row vertices, n-t column 
vertices, and m — 2t edges, and every vertex has degree 1, at least. Evaluating the 
number of such graphs by (7), we easily obtain 



^{Xn)t 



yc) 

f mpe 



{2t-mt\) 



2 (m- 2 t)l p 



At 



ml {eP — 1) 



3t 



using the definition of p for the second equality. Also from this definition, p ~ 
2{m - n)/n if p ^ 0, and p < m/n always. So, if p ^ 0, 

mpe~^^ ~ 2m(m — n)/n > 2{m — n) oo. 



and, if lim p > 0, then 

mpe~^^ > > pn^^ oo. 



Thus 

E(X„)-^oo, E(X „)2 ~ E2(X„), 

SO that (Chebyshev’s inequality) Pr(X^ > 0) 1. That is, whp there is no perfect 

matching. 

3c: a := m - n> 0 is fixed. 

If we form 4n— 3m isolated edges, the remaining 3{m—n) row vertices and 3{m—n) 
column vertices can be partitioned into 2{m — n) trees of size 3, half of the trees 
each containing 2 row vertices and 1 column vertex, and another half - 1 row vertex 
and 2 column vertices. The total number of such bipartite graphs is 



X*(n,m) 



n 

An — 3m 



2 

(4n — 3m)! • 



in'.? 

(n — 3 (t)!22^((j!)2 ‘ 



/3(m - n)\ 
\2(m — n)J 



1 ^ 

(2(m - n) — l)!!(m - n)l 






( 29 ) 




108 



Alan Frieze and Boris Pittel 



As for Ai(n, m), the total number of all bipartite graphs, by Corollary 2.2 and 
(109), it is given by 

where, using the definition of p, 



So, after simple computations. 



Since, for fixed cr, 



Ni(n, m) 



( n !)2 



22^((t!)2* 



(30) 



min 






(n — 3cr)\ 

it follows from (29) and (30) that, with probability approaching 1, the random 
graph has 2a > 0 isolated trees of size 3, thus no perfect matching exists. 

Theorem 1 is proved completely. □ 



4. Proof of Theorem 2 

We notice upfront that, for liminf m/n > 1, whp the random graph has an 

almost perfect matching, in a sense that 

lim < [n/2\ - = 0, (31) 

n-^oo ’ 

for every /3 > 0. This follows from the analysis of Karp-Sipser matching algorithm 
(its Phase 2, to be precise) [14], given in [Ij. The analysis of [1] shows that at 
most vertices that are left at the end of Phase 1, are not covered by the 

matching constructed in Phase 2. The random graph at the end of Phase 1 is 
uniform, subject to the number of vertices z/, and edges p and 6 >2. The analysis 
is robust with respect to these parameters and implies (31). So, loosely speaking, 
our task is to get rid of the term — 

First we prove (Lemma 4.1) that, analogously to the bipartite case, a graph 
with minimum vertex degree 2 at least, which has no perfect matching, must con- 
tain a certain (witness) subgraph. This result is based on the ideas of Edmonds’ 
matching algorithm, [17] (Section 7, Exer. 34). Conditioned by the proof of Theo- 
rem 1, one would expect to be able to show that whp the random graph in question 
does not contain such a witness. Indeed, our next Lemma 4.2 rules out (whp) all 
the witnesses of size en at most, € > 0 being sufficiently small. As in the proof of 
Theorem 2, our argument consists of showing that the expected count of ‘‘small” 
witnesses is exponentially small. However, we have not being able to extend the 
proof to larger witnesses. Apparently, for sparse graphs in question, the expected 
count of witnesses can be exponentially large, even though the count itself is zero 
whp. 

Not everything is lost however! As the next step we show (Lemma 4.3) that 
whp either the random graph has a perfect matching, or there are an^ pairs of 
disjoint vertices such that adding anyone of these pairs to the edge set of the 
subgraph, obtained by deletion of isolated odd cycles, increases the maximum 




Perfect matchings in random graphs 



109 



matching number. This fact and a coupling device, that allows to relate, approx- 
imately, the random graphs and to each other, enable us to prove 

certain monotonicity of the distribution of fJ>*{Gf~^) as the function of m. Lemma 
4.8. We combine this monotonicity property of the maximum matching number 
and (31) to complete the proof of Theorem 2. 

4 . 1 . Step 1 . ProlBling and counting the witnesses. 

We begin with 

Lenuna 4 . 1 . Let G = {V^E) be a graph with 5{G) > 2, and with no isolated odd 
cycles, which does not have a perfect matching, i.e. p*{G) < [|1^|/2J. For every 
X £ V which is not covered by at least one maximum matching, there exists a 
witness {K,L) = {K{x),L{x)), K,L cV, such that 

(i) : \K\ = \L\ + 1; 

(ii) ; Ng{K) = L, (Na{K) = {w i K : eEa,ve K}); 

(m): \Eg(KUL)\>\K\ + \L\ + 1; 

(iv) : each v £ L has at least 2 neighbours in K; 

(v) : for every y £ K, there exists a maximum matching that does not cover 

y; 

(vi) : adding any (x,y), y £ K{x), to E increases the size of a maximum 
matching. 

Proof. Let x £ V and let M be a maximum matching which does not 
cover X. Since y*{G) < L|V^|/2J, there exists s ^ x which is also left uncovered by 
M. Now let T be a tree of maximal size which is rooted at s and such that for 
each v £ T, the path from s to in T is alternating with respect to M. Let K, L 
be the set of vertices at even and odd distance respectively from s in T. For every 
y £ K, we can switch edges on the even path from the root to y to obtain another 
maximum matching that does not cover y. Furthermore, no leaf of T is in L, since 
otherwise switching edges along the odd- length path to such a leaf we would have 
increased the size of the matching. Therefore all the vertices of T, except s, are 
covered by M. Next, if a neighbor u of a vertex from K is not in U L, then 
u must be covered by M, which contradicts maximality of T. Therefore the pair 
(K,L) meets all the conditions, except possibly (hi). Using the fact that all the 
leaves of T are from K and that their neighbors must he in K U L, and S{G) > 2, 
we can assert that 



\Eg{KUL)\>\E{T)\ + 1 = \K\ + \L\. 

But if |£'G(i^UL)| = |K| + |L| then T consists of two even-length path, sharing the 
root s only, with the leaves forming an edge in G. Thus s has degree 2 in G, and 
K U L induces an odd cycle in G. As there are no isolated odd cycles in G, there 
must be some edge (t;, w), v £ L, w ^ K U L. Since v is covered by M, {v, w) ^ M 
and, for some x ^ K U L, we have (w,x) £ M. It is easy to see how to alter M 
solely on E{KUL) to obtain a maximum matching M' and the corresponding tree 
T' rooted at v instead of s. (Draw a picture!) The degree of v in T' is at least 3, 
so T' has at least 3 leaves and, for the corresponding K = K{T'), L = L{T^), the 
condition (iii) is satisfied, too. □ 

Turn to G^^. Given e > 0, let A„(e) denote the event that there exist K, L 
satisfying (i)-(iv), and such that \K\<en. The following lemma implies that whp 
witnesses must be large. 




110 



Alan Frieze and Boris Pittel 



Lemma 4.2. Let liminf m/n > 1. There exists an e > 0 such that Pr(yin(e)) = 



Proof. First of all, using liminf m/n > 1, we have, [21]: AT(n, m) the total 
number of graphs with minimum degree at least 2 is asymptotic to 



^o(n, m) = • exp(-p/2 - p^/i). 



Here p, p satisfy 



V27mVarZ 
Pfi(p) 2m 



pfo(p) 



(32) 



(33) 



f2{p) n ’ ^ hip) ’ 

i.e. p is bounded away from 0 and cx), and Z is Poisson(p), conditioned on Z > 2. 
In fact, for all a^b^x > 0, 

.(26-1)!! f2{xr 



iV(a,6) <c* 



nx 



r2b 



(34) 



where c* does not depend on a^b^x. (The attentive reader certainly notices direct 
analogy between these formulas and their counterparts for the bipartite case in 
Section 2.) The independent copies Zi, . . , , of Z provide an approximation to 
deg(r), the degree sequence of the random graph P, in the following sense: 

Pr(deg(r) eB) = Oin^^^PriZ e B)), (35) 

uniformly for all sets B of n-tuples. Consequently, if B is such that Pr(Z G B) 
is 0{n~^) for some b > 1/2, then Pr(deg(P) G B) — which goes to 

zero, too! A particular event B, which will come in handy, is defined as follows. 
Let d{j) = d(j, r) denotes the j-th largest degree of P. Pick a > e^^^{h{p) + 1)^ 
where h{p) = and define i{n,j) = [log ^1- Let us show that 

Pr(3j e [l,n] : d{j) > e{n,j)) = 0{n~^). (36) 

To prove this, consider first Z(j), the j-th largest among Zi, . . . , Z^. Clearly 

Pr{Z{j) > e{n,j)) < (^)pr^(Zi > %,j)) 

( en \ 

j log y + i log Pr(Zi > e{n, j))j , 

and, using the definition of i{n,j) and a, 

/ £(n 

Pr{Zi > i{n,j)) < Hp) /^^ jy - j 

/ . e^n\ 

< exp I -a log j • 

Consequently 

Pr(Z(j) > £{n,j)) < exp (-j(a - 1) log y) , 



n 

Pr(3i e [l,n] ; Z{j) > £(n,j)) < ^Pr(Z(j) > ^(n, j)) = 0{n-% 



SO that 




Perfect matchings in random graphs 



111 



whence the probability in (36) is 0{n Now, for a given vertex subset 5', 

1^1 

j€S j=l 

and on the event in (36) 

® ^ r a 1 

E 7 / .\ . 6 71 . , ?2 

d{j) <Y log ~ < (2 + a)s + s log 

j=i j=i ^ ^ 

We conclude that 

Pr(35 C [n] : > (2 + a)|51 + |5| log(n/|5|)) = 0{n-^). (37) 

J€S 

This bound will be needed shortly. 

Now, given k G [2,en], let denote the total number of pairs {K,L) 

consisting of disjoint subsets K^L C [n] such that \K\ — \L\ — k — (i)-(iv) 

hold and /i, u, are given by 

\E(K)\ + \E{L)\ = ii, 

\{{u,w) G E{T) : u e K,w e L}\ = u, 

\{{u,w) G E{T) :ueL,we {K U Ly}\ = ui. 



Note that by (iii) 

We want to show that 



u + fj, > 2k. 



(38) 




provided that e > 0 is sufficiently small. By 37, we may and will confine ourselves 
to ui such that 

< A{k k\og{n/k))^ (39) 

for a large enough constant A. All we need to show is that 

^ ^ E^,.,,,=0(n-^), (40) 

k<en 

“ (39) holds 

By symmetry. 



~\k,k-l,n-2k^l ) ’ 



where is the probability that the subsets A* = {1, . . . , fc} and L* = {fc + 

l,...,2fc - 1} form such a pair. To bound this probability we need to bound 
, the total number of graphs in question in which the pair {A*, L*} has the 
prescribed properties. 

Let {5j)j^K*^ i^j)jeL* and {Sj)j^(^K*uL*)^ be the degree sequences for sub- 
graphs G(AT*), G(L*) and G{{K* UL*)^) respectively. For j G K* {j G L* resp.) 
let Aj denote the total number of neighbors of j in L* (in K* resp.). For j G L* 
(j G {K*UL*y resp.) let dj denote the total number of neighbors of j in (A*UL*)‘^ 
(in L* resp.). Then 




112 



Alan Frieze and Boris Pittel 



^ = ^ Sj = 2{m- ii-iy-Pi), 

jeK*UL* je{K*UL*)^ 

S = ^’ Y1 = H = ^1- 

jeK* jeL* jeL* je(K*uL*)^ 

In addition, 

Sj + Aj>2, jeK*UL\ 

5j-\-dj>2, je{K*ULy. (43) 

It is worth noticing that (43) is a relaxed version of the actual restrictions. 
Also, lumping together 6j for j G K* and j ^ L* ^ we effectively ignore the fact 
that the graphs G{K*) and G(L*) are disjoint. 

Denoting the total number of graphs with the given D = (d, A, d) by A(D), 
and using the degree-dependent bounds for the counts of graphs, both general and 
bipartite, we obtain 



N(D) < 



(( 2 /.- 1 )!! n n ^ 

y j€K*UL* ^ J y j€K*UL* ^ ^ 



X l/j! 



n 









X (2(m-^-j/-i^i)-l)!! JJ 
(2/x — l)!!i/!i/i!(2(m - yi — v — — 1)!! • ^(D); 



$(D) = n 



nA n 



J-J- a-! -Ll d is i' 

jeK*uL* J J jeL* J je{K*uL*Y 



(44) 



Our task now is to evaluate the sum of ^(D), for all D that meet 

(42) and (43). To do so, let us first determine a multivariate generating function 
of $(D), for D satisfying (43) only: 



D satisfies (43) 



^jEK*UL* ^3 ^jeK*UL* ^3 ^ jeL* ^3 ^ j e(K*UL*)c 

i/2 Vs i/4 






X V5 

2fc-l 



= E 



i6+A>2 



yfv2 

(5!A! 



. $(D) 

eI) (e 



ia>o 



^a+<5>2 



yid 

d\6\ 



n-2k-\-l 



/ 2 (yi + 2 / 2 )^* ^foivs)'' ^ 2(^4 + 2 / 5 )" 



(45) 




Perfect matchings in random graphs 



113 



So now is the coefficient of ^ ^ in the function on 

the right hand of (45). Using 

[zrzr]F{z^ + Z2) = 

and /o(y 3 ) = we obtain then: 



g _ I2(ij, + u)\ I2{m - fi - u) - i/i\ {k - 

\ 2fj, )\ ui J I'l'- 

X (46) 

Here 

Vxi>0, 

and we will see that a sufficiently small xi will do the job. We can write an 
analogous bound for the last factor in (46), and (in the light of (32) and the 
relative smallness of our parameters //, i/, i^i) X 2 = p is a natural choice. In faet, 
we can do a bit better and get an extra factor by applying the Cauchy 

(circular) contour formula, cf. (82), in combination with (83). Using 



p 

N 



N 

N{n,m) ’ 

E 

D satisfies (42), (43) 



(32), (44), and an inequality 



we obtain then 



Cr) 



{2{u-v) - 1)!! < 



(2«-l)!! 






{p{k-i)r 



Then since 



L 

v>i >0 



(p(fc-i)r 

i/i\ 



= <eP\ 



(47) 



we get the bound (call it for which is (47) with the last factor 

replaced by 

Next, for ly > v'o = ^o(m) •= max{2(fc — l),2/c — //}, using (39), 




114 



Alan Frieze and Boris Pittel 



x{ vm 

p^ / k‘^ + k‘^ log^{n/k) V \ 
x\ \ km ^ m) 

‘ + ^log^(lA)) 

Xi 

< 1 / 2 , 

if fc < en and 0 < 6 < ei(:ri) is chosen sufficiently small. For this choice of e, 

= 0((3^,i.o(^)). (48) 

Furthermore, if p> 2 then i/q = = 2(fc — 1) and we have 



Q ^ A + k^ 

— Sib~2 ' 




<6^ • (^ + ^ffig^(l/ ^)) • “ 

Xi p 

< 1 / 2 , 



if /I > and 0 < e < e 2 (xi) < ei(xi) is chosen sufficiently small. If so, 

'^Qix,i^o ; 2 < jU < (49) 

/x>2 ^ 



To make the last bound explicit, we use (2a — 1)!! = (2a)!/(2“o!) and the 
Stirling formula for factorials to bound, for v = vq and 2 < /r < the 

combinatorial factors in (47) as follows: 



( 2 ^- 1 )!! 
/2(p + i2o)\ 

V 2m ) 

{2{m- h-Pq) - 1 )!! 

(2m -1)!! 



<fc 

<fc 

<fc 




exp(0(fce^^^ log e ^)); 
(2m)-('^-2)-2^exp(0(efc)). 



Using these bounds and k‘^vo\ < (2k)\, we obtain: for 2 < m < and xi < A, 



Q. 









X e 



pk 



Xi/ 



0(e^/2^) 



exp(0(fce^^^ loge ^)). 



(50) 




Perfect matchings in random graphs 



115 



If /i = 0 then pq = 2k ^ and if /i = 1 then Po = 2k — l, The direct computation shows 
that the bound (50) holds in these two remaining cases as well. So, collecting the 
pieces and using 

(2fc)! 



we get 



where 






k\{k-l)l 
^q- 



<b k 2 



,2k 



<bn ^/V*"exp((9(fc€^/2loge ^)), 



q = 2 



f2{xi) 



2m 



pP/2 



Using (33), we transform (52) into 

q = 



f2{p) 

2/2 (a:i) 



(51) 



(52) 



gp/2 — p/2 



The first fraction is strictly less than, and bounded away from 1. (That’s where 
the condition “lim inf m/n > 1 ” enters!). And the second fraction approaches 1 , 
from above, when xi j 0. So we can pick x\ small enough to make p < 1. For this 
choice of and the corresponding e — €(xi) < 62 (xi), we have 



g^exp(0(e^/^loge ^)) < qi := 



1 + q^ 



< 1 . 



Then (51) implies that 



/C<€TI /c>0 

Thus (40) is completely proved, and so is Lemma 4.2. □ 

Note. Let Xn denote the total number of isolated odd cycles in and let 

Gn denote the random subgraph obtained by deletion of all Xn cycles in question. 
If /i(Gn) < [(n — Xn)/2j then, by the previous lemma, Gn must contain a witness 
{K, L). According to the last lemma, whp K has to be large, of size en, at least. 



4.2. Step 2. Using a witness to gainfully add new edges. 

Introduce dE, the set of non-edges {x,y) of the random (sub)graph G^, such that 
adding (x, y) to the edge set makes the maximum matching number increase by 
one. 



Lemma 4.3. 

Pr(0 < \dE\ < e^n^/2) = 0{n-^). 

Proof. Suppose the event {\dE\ > 0}fiyin(c)^ happens. Then, by Lemma 
4.1, for every vertex x e V := V{Gn), not covered by at least one maximum 
matching (in Gn, needless to say), there exists a vertex set A" c U of cardinality 
en or more, such that (1) x ^ K, x ^ ( 2 ) adding any (x,y), y e 

to the edge set of Gn increases the maximum matching number; (3) for every 
vertex y E K there exists a maximum matching that does not cover y. This 
implies existence of the vertices Xi,, . . ,Xy^ E U, {Pn such that for every 

Xj there is a corresponding vertex subset Yj C V satisfying the conditions: (1) 
Xj ^ Ij, Xj ^ NQ^{Yj); ( 2 ) for every y £ Yj, adding (xj,y) to the edge set 




116 



Alan Frieze and Boris Pittel 



increases the maximum matching number; (3) \Yj\ > Consequently the edge 
set of Gn is missing at least 






i=i 



+ 1 ) ^ 

2 - 2 



pairs (x,y) such that adding any such pair to the edge set would increase the 
maximum matching. Therefore 

{0 < \dE\ < e^n^f2} C An(e), 

and the claim follows from Lemma 4.2. □ 

Note. Paraphrasing Lemma 4.3, with probability 1 — the subgraph 

Gn (equal minus all its isolated odd cycles) either has a perfect matching or 

there are at least an^ pairs of vertices {u^v) G P x P, {u^v) ^ E{Gn)^ such that 
adding any such (iz, v) would increase the matching number. 



4.3. Step 3. Counting the isolated odd cycles. 

Clearly, we need to determine the limiting distribution of the total number of 
isolated odd cycles in 



Lemma 4.4. Let lim inf m/n > 1. Then Xn is, in the limit, Poisson (A), i.e. 



Pr(X„ = j) - e j > 0. 



Here 



I l+a a p 

" = —1’ 



(53) 

(54) 



and p is defined in (33). 



Proof. Let Xn,i denote the total number of isolated odd cycles of length 
^ > 3. Then, given L > 3, 






n\ {£ - 1)! N{n - i,m — i) 
2 N{n, m) 



Using (32) and (34) with x = p, and 
e-i 

n-j 



n 



“2(m-j)-l 



< 



n 



2m - 1 

we see that the generic term in the sum is of order at most 

r.2 1 



1 



n 



2i\jn-i \2m f 2 {p) 



n 



• a . 



n 



(55) 



Since <j < 1, it easily follows then that E Xn/j 0 if L = L{n) 

oo however slowly. Consequently, whp there are no isolated odd cycles of length 
exceeding L = L{n). Introduce 



£<L{n) 
I odd 




Perfect matchings in random graphs 



117 



Then, for every fixed fc > 1, 



E[(X*),] = J^Rn,mW n 



£<kL 



- 1)! 
^ ■‘■A 2(L\) 



£i-\ \-lk—^ 

ii,...,£k odd 



K 



- ^ Rn,m{^) 






KkL 



^j€[3,L],j odd 



Rn,mW • = 



n\N{n — i,m — i) 
{n — £)\N(n^m) 



For £ < L{n) and L{n) oo sufficiently slowly, N{n — £,m — £) is asymptotic 
the RHS in (32), with n and m replaced by n - ^ and m - £. (The point here is 
that the difference between p and p{£) corresponding to n — £, m — ^ is of order 
0{L/n), and this difference leads to an extra factor exp(0(L^/n)) 1, provided 

that L = o(n^/^).) Consequently Rn^rni^) ~ uniformly for £ < L. Therefore, 
using rr < 1, 






E X‘^ 



Ug[3,L]J odd 



E 

£<kL 



El»' 



E 



<je[S,L],j odd 



{axy 

~w 



E 



W€[3,L],j odd 



{axy 



2? 

^j€[3,L],j odd 

Thus X* is in the limit Poisson with parameter 

1, 1 + cr (T .. .. 

^ = ilogrr^-2+^W = A + o(l). 



j€[3,L],j odd 



Since Pr(X^ ^ X*) 0, the proof is complete. □ 

As a brief summary of Steps 1-3, we have established that whp contains 

few isolated odd cycles, and upon deletion of these cycles we end up with a sub- 
graph such that if it has no perfect matching, then there are of order non-edges, 
whose individual insertions would increase the maximum matching number. 

To capitalize on these results, we need to find a way to compare the maximum 
matching number for two different values of m. And this is a serious challenge, 
since —unlike the Erdos-Renyi random graph G(n, m) — we do not know of any 
construction which would have allowed to consider as a subgraph of 

for mi < 7712 • The next step shows how such a coupling can be done asymptotically. 




118 



Alan Frieze and Boris Pittel 



4.4. Step 4. An asymptotic coupling of and 

Let uj = [r log nj for some constant r > 0. Consider the bipartite graph F with ver- 
tex set bipartition Sn“m_a; + and the edge set ^^(r) defined by the condition: 
for G G and H G Si^^AG,H) G E{T) iff 

E{G) C E{H) and E{G) \ E{H) is a matching. 

(So (G, H) e E{T) iff E{H) is obtained by adding to E{G) cu independent edges.) 
Consider the following experiment SAMPLE: 

• Choose G randomly from Sn^_o; 

• Add a random matching M, disjoint from E{G), of size u to obtain H € 

q5>2 

Jn,m’ 

This induces a probability measure Q on show that Q is 

nearly uniform. 

For V e Sn-L + Sn7m, let o?r(x;) denote the degree of v. 

Lemma 4.5. G G 5 n~m-uj 

< (&\ 

UJl \ ^ J 



Proof. The RHS is obvious. For the LHS let us bound from below the 
number of ordered sequences ei, 62, . . . , of a; edges which are not in E{G), and 
form a matching. Observe that after choosing 61,62,. .. we rule out at most 
m — CO 2 in choices for 6j+i. (The m — uj edges of G plus the further < 2 i{n — 2) 
choices of new edges incident with 61, 62, . . . , 6^). Thus there are always at least 
(2) choices for Dividing by uj\ accounts for removing the ordering. 

□ 

Thus for n large and G, G' G 



dr{G) 

dr(G') 



n 



(56) 



We need to prove analogous estimates for the degrees dr(E), H G 
To this end, let A(Ff) denote the maximum vertex degree in iL, and let 
Ey{H) be the edges of H joining vertices of degree at least 3. (Why looking at 
Ey {H )7 Well, if (G,H) is an edge in S, and e G E{H) \ E{G)^ then other edges 
of H incident to e must already be in E{G). So E{H) \ E{G) C Ey{H).) 



Lemma 4.6. Let 



6 = c y , where 



p{eP - 1 ) 



= 6 , C = 



eP — 1 — y 

If H is chosen uniformly at random from 
(a) 

A(iL) < logn. 

(b) 

\E>{H)-&n\ = 0 {AGlogn). 



2 m 

n 



sequence of events En is said to occur quite surely if Pr(£n) = 1 — 0(n for any K > 0. 




Perfect matchings in random graphs 



119 



Proof. Let be independent copies of Po(p; > 2). Intro- 

duce the random set Sz that contains Zi (distinguishable) copies of the vertex 
h ^ ^ i ^ n; denote sz = \Sz\- Given 5z, we choose uniformly at random “pair- 
ing” of all Sz elements of *Sz. A convenient way of generating such a pairing, 
is to choose a random permutation tt = (tti , . . . , tTs^ ) of Sz and to form pairs 
(7ri,7T2), (7r3,7T4), (When sz is odd, one vertex in az remains without a part- 

ner.) Conditional on {sz = 2m} H {pairing is graph- induced}, the uniformly ran- 
dom pairing (permutation tt) defines a uniformly random graph H G 9n^- Like 
the bipartite case in Section 2, we have that 



Pr(£o) = E (F(Z) • = 



where 



F(Z) =expH(Z)/2-r72(z)/4 + 0(m^Zym)), t?(Z) = ~ 1), 



cf. [21], Implicit in (32), (33) is 



Pr(£o) = (1 + 0(1)) 









exp(-p/2-pV4) 

^27rVar(Z) 



(57) 



i.e. Pr(£o) is only polynomially small, of order exactly. This implies that 

if (Z G A}, A C {{2,3, . . . }^), is a qsevent, then so is the event {deg(JT) G A}, 
deg(iL) denoting the degree sequence of i/ G Sn^^- The part (a) follows then 
immediately since, for L = logn, 

Pr(maxZj, > L) < nPr(Zi > L — logn) = 0{ny^ /L\), 

3 



which is 0(n for any K > Q. 

Turn to (b). Let W be the number of pairs (7T2i-i, 7T2 z) in the random per- 
mutation TT of the multi-set Sz such that both 1 ^ 21-1 ^ind 1^21 are copies of the 
vertices of degree 3 or more. We know that, conditioned on the event £ 0 ? there is 
W == Ey{H). And it is easy to see that 

E(ir I Z) = (58) 

where 

SZ,3 = 



Now 



E(sz) 


p(eP-l) „ 

= nEZi = n— == 2m, 

eP — 1 — p 




(59) 


E(sz,3) 


= n(EZi - 2Pr(Zi = 2)) = n ^ 




(60) 


^eP - 1 - p eP - 1 — p) 




= np. 




(61) 




120 



Alan Frieze and Boris Pit tel 



And sz, 5 z ,3 are the sums of independent copies of Z and Z respec- 

tively. Using the pgf’s 



E(x^) 



f2{xp) 
f2{p) ’ 



E(x^) 



pV2 + fsjxp) 

f2{p) 



(59), (61), in a standard (Chernoff-type) way, we obtain that qs 



|sz - 2m| < log n, |sz,3 - np\ < log n. (62) 

Denote the event in (62) by £i. Then £i holds qs. It follows from (58) that, on 
the event £i, 

E(1T \ Z) = 6n-\- 0(n^/^logn), 9 := 

Next we appeal to the Azuma-Hoeffding inequality to show that, conditional on 
Z, W is tightly concentrated around E(VF | Z). The A-H inequality applies since 
transposing any two elements of a permutation of Sz may change W by at most 
2, see Appendix B. So, for every > 0, 



Pr(|W - E(IT \Z)\>u\Z)< 

Removing the conditioning on Z, and using the definition of the event £i, we 
obtain 

Pr(|IU - E{W I Z)| >u)< Pr(£J) + 

So, substituting u = logn and using (63), we see that qs 

\W — 6n\ < logn, 

if a constant A is sufficiently large. Recalling that W — E^{H) on the event £q, 
and that Pr(£o) is of order we complete the proof of the part (b). □ 

Now let S be the set oi H e Sn'm satisfying the conditions of the above 
lemma i.e. 



• The number of edges joining two vertices of degree > 3 is in the range 
6n ± At\^I^ logn 

for some constant A > 0. 

• The maximum degree A(if) < logn. 

According to the lemma 4.6, 

\ SI < |S|n-^, Vi^>0. (64) 

Note next that 



Lemma 4.7. H e S implies 

{6n — An^^‘^\ogn~2uj\ogn)'^ . . 

j < nr(^j < 



On + An^/^ log 



n 



Proof. The upper bound follows from the earlier observation, namely that 
every edge among u edges added to a graph G G ^rEm-u obtain the graph 
H must connect two vertices which have degree 3 or more in H, and from the 
condition if G S- For the LHS, as in Lemma 4.5, let us bound from below the 
number of ordered sequences ei , 62 , . . . , e^; of u; edges which are contained in E> (H) 
and form a matching. Observe that after choosing ei, 62, . . • , we rule out at most 
2iA{H) choices for e^+i, since we have restricted ourselves to matchings. Thus 
there are always at least On — An^/^ logn — 2a; A choices for e^+i. Dividing by col 
accounts for removing the the ordering. □ 




Perfect matchings in random graphs 



121 



So for H, H' e g, 



2Auj log n 

Finally, for H € Si-^ \ § and if' € 3, 



dr(H) 

dr{H') 



dr{H) ^ 
dr(H') - 



C) 

(0n— An^/2 log n—2u) log n)^ 
ui\ 




2m 




(65) 



(66) 



as the total number of ways to delete a matching of size u from H G Sn“m C!^) 
at most. 

Having proved the bounds (56), (65), (66), we can now show that the dis- 
tribution Q on Sn ^7 induced by the uniform distribution on ^he 

SAMPLE, is nearly uniform itself. 

Let Go e be fixed. By (56), if if € then 



Q{H) 



Pr(sAMPLE chooses H) 



1 



IS 



S>2 I 
n,m— a; 1 



X 



E 

G-.{G,H)eE{r) 



1 

MG) 



1 + 0{w^/n) dr{H) 

\K%-J dr{Go)' 



(67) 



Prom this relation, (65), and (66), it follows that 



H,H'eS 


implies 


Q{H) ^ SAujlogn 

Q(if') - 6»nV2 ’ 


(68) 


H € g^^^ \ g, if' G g 


implies 


Q(if) ^ /2cy 

Qm-\e) ■ 


(69) 


Furthermore, invoking also 








E 


driG) = 


E dr{H), 








Hesir-^ 





and picking iL' G S, we obtain (see (56), (65)): 

dr{H') ^ r 4^wlogn\ 

?o) - I ^ MG ) igi ■ 



dr (Go) 

Combining (64), (67), (69), and (70), we get: for every K > 0, 

'2c'' ' 



(70) 



Cl{9i%\9) < Q{H') 



e 



■ n 



-2K 



ISI 



l + Ojo^Vn) drjH') /2cy ~ 

IS^^-J ■ driGo) I ^ J 

= 0((2c/6>)‘^n-2^) 



< n 



-K 



(71) 




122 



Alan Frieze and Boris Pittel 



(As (J = [rlognj, the last bound holds for K > K(r).) Finally, since Q(Sn^) = 1? 
from (64), (71) and (68) we deduce that, for iJ G §, 



Q{H)- 



1 



\un,m 1 



1 




5Au) log n 
0nV2 



(72) 



This means that on the graph set S the probability measure Q is almost uniform. 



4.5. Step 5. Asymptotic monotonicity of the tail distribution of the maximum 
matching number. 



We will use G = G{G) to denote the subgraph of G obtained by deletion of all 
isolated odd cycles in G, and X{G) to denote the total number of these cycles. 
Clearly 



m*(6) < 



n-X{G) 

2 



if|F(G)| = n. 



Let M stand for a generic value of the number of edges, and let Pr^ denote the uni- 
form distribution on We will use Qm to denote the probability distribution 

induced on uniform distribution Ftm-uj on via the SAMPLE 

procedure analyzed in Step 4. To shorten notations and to underscore dependence 
on M, we will write Gm and Gm instead of and respectively, and 

Xm instead of Introduce 



e(M,r) = Pr 

^ M 



M*(Gm) < 



n - Xm 
2 




Lemma 4.8. Let u G (0,1) he so small that (1 — u) liminf m/n > 1. Then, for 
M G [(1 — u)m, m] and t <tn — log~^ n, 



e{M,t-l) <e{M 



UJ, t) + 



6Auj log n 



(73) 



Proof. 

.... p(5>2 , p5>2 

tition + S - 



be a right-side vertex in the graph P on the bipar- 
^ ^ . Suppose that 






n - X{H) 



— t 



Then, for every G G Nr{H), either /i*(G) < [{n-X{H))/2\ - t, or ^*(G) = 
[{n - X{H))/2\ — t, in which case none of uj non-edges (u, u), added to E(G), is 
in dE{G{G)). We know that 

Pr (1 < \dE{GM-u,)\ < an^) = 0{n-^). 

M — UJ 



Besides, conditionally on Gm-lu, the probability that none of u random vertex 
disjoint non-edges belongs to dE{GM-uj)i is bounded by 



\dE(GM-u.)\-ojY 

il) J ■ 




Perfect matchings in random graphs 



123 



And for \dE{GM-uj)\ > the last bound is at most (1 - a)^ <n if we pick 
r in a; = [r log n\ sufficiently large. Therefore 



QmU*(^)< 



n - X{H) 



^ + 1) < Pr {/^*(GM-a;)< 

' M—u y 

+0{n~^) 

< e{M -oj,t) + - , 



n — Xm-u 



-t 



for some absolute constant 6 > 0. So 
n-X{H) 



Qm < 

and then, using (72), 



t + 1 and H £ S) < - w,t) H — , 



Pr 

M 



n - X{H) 



< I e{M — LU^t) -\ — ] ( 1 + 



^ T 1 and H G S 
5Au log n 



Therefore, recalling (71), 



0nV2 



— t + 1 j < f e(Af — cj,t)n — j (l“h 



hAuj log n 
6rAl‘^ 



+n‘ 



-K 



< e{M — cj, t) + 



6Acu log n 



In other words. 



e(M, ^ — 1) < e(M — cj, t) + 



8n^/‘^ 

6Au) log n 
0n^/‘^ 



for all M G [(1 — u)m,m\. 



□ 



4.6. Step 6. Completion of the proof of Theorem 2. 

Iterating the inequality (73) t <tn times from M = m and t = 1, we obtain 

e(m, 0) < e(m - tcj, t) + t • — • (74) 

Take t = /3 > 0 being small. Then, using the definition of e(-,-) and 

(31), we obtain that e(m — tu, t) 0. And the second term on the RHS of (74) is 
q(^-o. 3+^ log2 ^) _ o{l)^ if /? < 0.3. 

We conclude that whp /i*(Gn,m) equals [(n — A^)/2J, so that G{G^^) has 
a perfect matching. Since is asymptotically Poisson(A), this proves Theorem 
2 . 



References 

[1] J. Aronson, A.M. Frieze and B.G. Pittel, Maximum matchings in sparse random 
graphs: Karp-Sipser re-visited. Random Structures and Algorithms 12 (1998) 111- 
178. 




124 



Alan Frieze and Boris Pittel 



[2] B. Bollobas, Random graphs, in Combinatorics (H.N.V. Temperley Ed.) London 
Mathematical Society Lecture Note Series 52, Cambridge University Press (1981) 
80-102. 

[3] B. Bollobas, T. Fenner and A.M. Frieze, Hamilton cycles in random graphs with 
minimal degree at least k, in A tribute to Paul Erdos, edited by A. Baker, B. Bollobas 
and A. Hajnal, (1988) 59-96. 

[4] B. Bollobas, C. Cooper, T. Fenner and A.M. Frieze, On Hamilton cycles in sparse 
random graphs with minimum degree at least k, Journal of Graph Theory 34 (2000) 
42-59. 

[5] B. Bollobas and A.M. Frieze, On matchings and Hamiltonian cycles in random 
graphs. Annals of Discrete Mathematics 28 (1985) 23-46. 

[6] R.Durrett, Probability: Theory and examples, Wadsworth and Brooks/Cole, 1991. 

[7] P. Erdos and A. Renyi, On the evolution of random graphs, Publ. Math. Inst. Hungar. 
Acad. Sci. 5 (1960) 17-61. 

[8] P. Erdos and A. Renyi, On random matrices, Publ. Math. Inst. Hungar. Acad. Sci. 
8 (1964) 455-461. 

[9] P. Erdos and A. Renyi, On the existence of a factor of degree one of a connected 
random graph, Acta. Math. Acad. Sci. Hungar. 17 (1966) 359-368. 

[10] A.M. Frieze, Maximum matchings in a class of random graphs. Journal of Combi- 
natorial Theory Series B 40 (1986) 196-212. 

[11] A.M. Frieze, Perfect matchings in random bipartite graphs with minimal degree at 
least 2, to appear. 

[12] A.M. Frieze and B. Pittel, Probabilistic analysis of an algorithm in the theory of 
markets in indivisible goods. Annals of Applied Probability 5 (1995) 768-808. 

[13] M. Karohski and B. Pittel, Existence of a perfect matching in a random (l-he“^)-out 
bipartite graph, J. Comb. Theory B 88 (2003) 1-16. 

[14] R.M. Karp and M. Sipser, Maximum matchings in sparse random graphs, Proceed- 
ings of the 22nd Annual IEEE Symposium on Foundations of Computing (1981) 
364-375. 

[15] J.H. Kim and N.C. Wormald, Random matchings which induce Hamilton cycles, and 
Hamiltonian decompositions of random regular graphs. Journal of Combinatorial 
Theory, Series B 81 (2001), 20-44. 

[16] V.F. Kolchin, Random mappings. Optimization Software,l^evj York, 1986. 

[17] L. Lovasz, Combinatorial problems and exercises. Second edition, North-Holland, 
1993. 

[18] C. J.H.McDiarmid, On the method of bounded differences. Surveys in Combinatorics, 
1989, (J. Siemens ed.), Cambridge University Press (1989) 148-188. 

[19] B. D. McKay, Asymptotics for symmetric 0-1 matrices with prescribed row sums, Ars 
Combinatoria, 19A (1985) 15-26. 

[20] B.G. Pittel, Paths in a random digital tree: limiting distributions. Advances in Ap- 
plied. Probability 18 (1986), 139-155. 

[21] B. Pittel and N. Wormald, Counting connected graphs inside out, J. Comb. Theory 
B, to appear. 

[22] R.W. Robinson and N.C. Wormald, Almost all regular graphs are Hamiltonian, Ran- 
dom Structures and Algorithms 5, (1994) 363-374. 

[23] D.W. Walkup, Matchings in random regular bipartite graphs. Discrete Mathematics 
31 (1980) 59-64. 




Perfect matchings in random graphs 



125 



APPENDIX 

A. Enumerating bipartite graphs 

Consider the bipartite graphs with vertex bipartition RUC, (Rows and Columns), 
R = [ui] and C = [v 2 ]- Given /x, the -tuple a, and the i/ 2 -tuple b of nonnegative 
integers ai^i e bj^j G [^'2)5 let A^(a, b) denote the total number of the bipartite 
graphs with the row degree sequence a and the column degree sequence b. Using 
the bipartite version of the pairing model, we see that 

iV{a,b)<Ar*(a,b); iV*(a,b) ^ ^ (75) 

11 11 ¥ 

ie[i^i] j^W2] 



The fudge factor, i.e. the ratio 



F(a,b) 



N{2L,h) 
N*{a, b)’ 



(76) 



is the probability that the uniformly random pairing is graph induced. A sharp 
asymptotic formula for F(a, b) has been a subject of many papers. A culmination 
point is [19] by McKay who proved that if 0, D being the maximum 

degree, then 



N{a,h) 


= AT* (a, b) exp (-^A(a)A(b) + 0(T»Vm)) , 


(77) 


A(a) 


• ^ ^ l)j 


(78) 


A(b) 




(79) 



The formulas (75) and (77) are instrumental in asymptotic evaluation (estimation) 
of the total number of bipartite graphs with a given number of edges and certain 
restrictions on the degree sequence. 

Neglecting for now the fudge factor in (77), 



NcA^,fx)< ^*(a,b), (80) 

d 'i c 2 j 
i j 

In order to rewrite (75) in a more manageable way, we observe that 



E 

ai>Ci,bj>dj 
Ei «i=A^l5Ej bj=ll2 



1 

n «i!- n ¥ 



Gc{x)Hd{y); 



Therefore (75) becomes 

NcA^,^i) < (i\[x^y^]G,{x)HAy) (81) 

= nl{2Tri)~^ j) x~^~^Gc{x)dx ■ ^ y~^~^HAv)d§7‘) 

|x|=ri \y\=T2 




126 



Alan Frieze and Boris Pittel 



for all ri,T 2 > 0. Using an inequality (Pittel [20]) 

\ft{z)\ < ft{\z\)exp (-— ■ — ^^- ) , (83) 



(4), (5), and the fact that 

\z\ - Re z = r(l — cos6) > cr9^, when z = re*^, 6 G (— 7r,7r], 
we see from (82) , after a straightforward estimation, that 






1 



Gc(ri) 



1 






^.(84) 



Here and elsewhere A <b B means that A — 0(B), uniformly for all feasible 
parameters that determine the values of A and B. In the sequel we consider only 
msxiCi = 0(1), maxj dj = 0(1), in which case the bound (84) simplifies to 



A^c,d(t^,M) <6 • (i^i^i) 



Ge(ri) 



(^ 2 ^ 2 ) 



- 1/2 (^ 2 ) 



(85) 



The task of determining the ‘‘best” values of ri and V 2 and incorporating the left- 
out fudge factor will be made easier by looking at the above through probabilistic 
lenses. 

Fix ri,r 2 > 0 and introduce the independent random variables Yi^Zj, with 
the distributions 



Vr(Yi=e)= (86) 

MZj=£)= (87) 

so that, in distribution, Yi is Poisson (ri) conditioned on {Poisson(ri) > c^}, and 
Zj is Poisson(r 2 ) conditioned on {Poisson(r 2 ) > dj}. In short, Yi = Po(ri;> Ci) 
and Zj = Po(r 2 ; > dj). Now (81) can be rewritten as 






Gcin) 



= X 






Hd{r2) 



Hdin) 

Pr .( 88 ) 



Now the RHS expressions in (81), (82) and (88) are equal to each other and the 
RHS of inequality (85) bounds them all. Therefore, 



sup Pr 




<b 



1 



sup Pr 




1 

x/^^2^2 * 



<b 



(89) 




Perfect matchings in random graphs 



127 



Furthermore, (80) becomes equality when A/'*(a, b) is replaced by A/'(a, b). So, 
analogously to (88), 



,,I ,Y' I V 

^ n ir n ^ 

^ ’ ai>a,bj>dj ie[i/i] je[i^2l ^ 









Ei “i=Mi.Ej bj=ii2 

Gc(ri)/fd(r-2) 



(r’iT’2)^ 



2; Pr(Y = a,Z = b)F(a,b) 



Cli ^ dt j 

Ei ai=/ii,E7 ^i=/^2 



Gc(ri)/fd(r2) 

(rir2)^ 



E 



(f(Y,Z) 



•l{EiV-i=/^}l{E,^,=M} 



), (90) 



where F(-,-) is defined in (76). To make this formula useful, we need to show 
that, for a proper choice of ri,r 2 , asymptotically we can replace A(Y)A(Z) in the 
formula (77) by E(A(Y))E(A(Z))). 

Prom now on let us assume that and 

fi~^ <b ri,r 2 <b log/i and that z/i,z /2 = (91) 



Since max^ c^, maxj dj are both 0(1), using the definition of Y^, Zj and the 
conditions on z^i, z ^2 and ri, r 2 , we have: for 0 < a' < a. 



Pr(max{max Y^, maxZj} > < ^Pr(Y^ > /j,^) + ^Pr(Zj > /x^) 

'^3 

^ 3 

Therefore, for a < 1/3, the expected value in (90), is given by 



Ei/,m 


= (l + 0(/x-i+^“))E^,^ + 0(e-^“ ); 


(92) 




= E (f*(y,z) • i{EiE=rfi{E,Zi=M}) ; 


(93) 


F*(a,b): 


= exp (-iA(a)A(b)) , 


(94) 



see (77). In particular, see (89), 

<b {viV2rir2)~^^'^ ■ 

Let us estimate the effect of replacing A(Y), A(Z) in (94) by their expected values. 
To this end, let us introduce 

U, = (Y,)2 - E((Yi)2), Y, = {Z,h - E((^,)2). 

Simple computation shows that E((Y) 2 ) is of order 0(1 + r?), whence of order 
0(log^/x), and likewise ¥t{Zj{Zj — 1)) = 0(log^/x). From A(Y) = /^~^Z^i (^)25 
A(Z) = Sj(^j)2? h follows then that E(A(Y)), E(A(Z)) = 0(log^ /x) and that 

after using the expansion 

ab — dh — {a — d){h — 6 ) + d{h — 6 ) + h{a — d) 

we have 

|A(Y)A(Z)-E(A(Y))E(A(Z))| <6 (logV)A(Y, Z) + A2(Y, Z), (95) 

A(Y,Z) = |A(Y)-E(A(Y))| + |A(Z)-E(A(Z))|. 




128 



Alan Frieze and Boris Pittel 



Therefore, if we replace F*(Y,Z) in (93) by exp(— ^E(A(Y))E(A(Z))), then the 
compensating factor is exp(0(log^ //A(Y, Z) + A^(Y, Z))). Furthermore, setting 
u = log^^ fi, we estimate 



Pr{\Ui\>u) < 

<b 

< 



' VlogV/ 

exp(-n(log®|i)). 



Likewise 



nui-,m>u) = Y1 [W2-E((ri)2) 



e:{e)2-E({y^)2)>^l 



r{/l\ 

fcM) 



< 



b ^1 



2eri 

logV 



< exp(-f2(log®/i)). 



(96) 



Let Ui = so that 11X^1 < u. Then (Azuma-Hoeffding inequality), for 

every t > 0, 



Pr 





< 2 exp 




Since Ef/^ = 0, from (96) we have 



^EU, 

i 

Therefore, for X > 1, 



'^E{Ui-,\Ui\>u) 



< exp(-fi(log® fj,)). 



Pr 





< J]]Pr(|?7i| > u) +Pr 

i ' 

< 5]Pr(|C/il>u)+Pr j 

i ' 

<b exp(-f](log® fj.)) + exp 



E«* 




^(lli - EUi 



>t- 






{t - exp(-Q(log^ jj.))y 
2v?vi 



= exp(-J^(logV)), (97) 

the latter inequality holding if t = log^ /i. An analogous inequality holds for 
Vj . Equivalently 



Pr (|A(Y) - E(A(Y))| > log^^ m) < exp(-n(V m)), 

Pr (|A(Z) - E(A(Z))| > logio < exp(-Q(log" fi)). 




Perfect matchings in random graphs 



129 



Combining these bounds with (95) and (93), and denoting R = Yi, S = 
we get 

= (l+0(/i-'/2 logi2 ^))e-iEA(Y)EA(Z)p^ ^ Pj, ^ ^)+D^^^(98) 

\Du,^^\ <b exp(-fi(logV)). (99) 

In (98), the exponential factor is exp(— 0(log^ /i)), and, by (89) and the conditions 
on /i, z/j,ri, the product of the probabilities is of order {y\TiV 2 V^^I'^ ^ the latter 
being fl((/ilog//)“^). The resulting bound makes the remainder relatively 
negligible, so that 

g-iEA(Y)EA(Z) 

The power of this bound is due to wide range of the parameters for which it 
holds. However, we will also need an asymptotic formula for E* and this requires 
asymptotic formulas for the local probabilities, rather than their upper bounds. 

Intuitively, we stand a better chance of achieving this goal when the parame- 
ters fi,r 2 are such that the events {^- Yi = fi} and Zj = //} have “sizeable” 
probabilities. What better candidates than r\ = pi and r 2 — p 2 for which 

y^Ey^ = /i, — M- 

2-1 3=1 

Explicitly, using (86) and (87), pi and p 2 are the roots of 



E 

i=l 



Xfc,-l{x) 

fc,{x) 






( 100 ) 



and 



E 



1=1 



xfd^-l{x) 

fdi {x) 



= 



(101) 



respectively; {ft{z) := e^, for t < 0). For rr 0, the LHS of (100) and (101) 
approach Ci < p and dj < p, respectively. Each LHS is strictly increasing, 
asymptotic to uix and U 2 X respectively, as x ^ (X). Assuming that Ci < p 
and dj < /x, we see that the positive roots pi and p 2 exist uniquely, and that 
pi < pjvi. Assuming from now that p 0(z/i logi/^), i = 1,2, we obtain that 
Pi — 0(\ogp) which puts Pi and p 2 into the set of feasible (meeting (91)) ri and 
T 2 , respectively. 

How do the probabilities in (98) behave if ri = pi, r 2 == P 2 ? For Yi = Po(p; > 
2) and p = uE{Y), it was proved in [1] that 




p + a 



l + 0(a^(pi/) ^) 

(27Ti/Var(y))^/2 ’ 



provided that pu oo, and o? /{pv) -> 0. (The condition pv oo is equivalent 
to i^Var(y) ^ oo since Var(y) = 0(p))- Suppose that in the present context 
uipi oo, i.e. '^iy3Lr(Yi) oo, which is equivalent to p - oo. Only 

simple modifications of the proof in [1] are needed to prove 




130 



Alan Frieze and Boris Pittel 



Lemma A.l. If X) • Var(yi) ^ 00 and the Ci are uniformly bounded, then 






l + 0{o?{vipi) 

(27rE,Var(y,))i/2’ 



if o?{uipi) ^ ^ 0. An analogous formula holds for Pr Z, = p + a^. 

Proof. Let W — y. As usual, we start with the inversion formula 



Pr(VF = r) = 



f dx 

J — 7T 



where t = p + a. Let 






^> = E^7y = e<‘'>«) 



and consider first |a;| > Using inequality (83) we estimate 



/c.(e“Pi: 



<—[ fTe^dcosx-l)/(c* + l)^^ < g-syv3_ (103) 

27T fj[ 

For |a;| < putting p = /9ie“ and using Pi/c^ (pi)//c< (Pi) = P, d/dx = 

ir]dl dr] we expand as a Taylor series around x = 0 to obtain 



—irx + 5^ log 



fce{pi) 



= -rox - Y 2^ D 



Pfcei'n) 

fciiv) 



ceVU / \t]=p 



vfcAv) 

fciiv) 



+0 x^^v^ 



vfLiv) 



; (104) 



here f/ = with x being between 0 and x, and D — r]{d/drj). Now, the coeffi- 

cients of x^/2, x^/3! and x^ are Var(TT), O (Var(W')) , O {Va.r{W)) respectively, 
and Var(M^) is of order Ei. So the second and the third terms in (104) are o(l) 

uniformly for \x\ < Therefore 







Perfect matchings in random graphs 



131 



where 



L-iL 



1 



V27rVar(T^) 



g-iaa:-Var(VV)x^/2 

+ 0 ''“' + ''' 



e; 



3/2 



(106) 



L- 






f pif'cipi) 



— 1 / 

1) 



^3g-Var(H^)xV2^^ 



= o 



fsi [ 



, |3g-Var(iy)xV2^^ 






(a > 0 is an absolute constant), and 

^4g-Var(iy)xV2^^ 



(107) 



/ = O E. / 



/|x|<E- 



5/12 



o 



^3/2 



(108) 



Using (102)-(108), we arrive at 
Pr {W = t) 



^/2^^vVar{W) 



1 + 0 



0^ + 1 
Si 



□ 



Suppose, say, that uipi is bounded, or equivalently that ai := pi — 

0(1). Extending an argument in [21] (which covers the case of identically dis- 
tributed 1^), we can show that 






CTi! 



(109) 



An analogous relation holds for Pr(*S' = p) if 02 dj — 0(1). Clearly then, 

regardless of the behavior of (Ji,cr 2 , in (98) the remainder term is negligible 
compared to the explicit term. □ 



B. Concentration of W. 

We need to prove the following result. 

Let 5 be a set with \S\ = N. Let Q be the set of N\ permutations of S. Let 
u be chosen uniformly from fJ. 

Let Z = Z(cj) be such that \Z{u)) — Z{uj')\ < 1 when oo' is obtained from lo 
by interchanging two elements of the permutation. 

Lemma B.l. 

Pr(|Z - EZ\ >t)< 

Proof. For a fixed sequence permutation (xi, X 2 , . . . ,x^) and 0 < i < iV 
let 

Zi{xi,X2, . . . ,x^) = E(Z I LOj = Xj, I <j < i). 




132 



Alan Frieze and Boris Pittel 



Clearly the sequence Zq, Zi, . . . , Zjv is a martingale. To apply the Azuma -HoefF- 
ding inequality, we need to show that 

\Zi{Xi,X2,...,Xi) - Zi{xi,X2,...,Xi-i,x'i)\ < 1 (110) 



for all i-tuples (xi,rr 2 , . . . with distinct components, and x[ ^ xi, . . . 
(Indeed, the inequality (110) readily implies that — Zi\ < 1.) 

Consider 

= {lu eft : ujj = Xj^l < j < i} 

and 

ft[ = {lo eft : (jOj — Xj^l < j < i — l^uji — 
and the map f : fti ft[ defined as follows, li u; = X\X2 . . • Xi-iXiHi-^i . . ,yN and 
yj = x' then f{u) = xiX 2 . . . x^-ix'^/i+i . . . yj-iXiyj^i ...yn^ Observe that / is a 
bijection and that 



\Zi{xi,X2, ■..,Xi)~ Zi{xi,X2, . . .,x'i)\ = 






(N - i)\ 



< 1 . 



□ 



Alan Frieze 

Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh PA 
15213. 

Boris Pittel 

Department of mathematics, Ohio State University, Columbus OH43210. 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Estimating the Growth Constant of Labelled 
Planar Graphs 

Omer Gimenez and Marc Noy 



ABSTRACT: Let Gn be the number of labelled planar graphs on n vertices 

and 7 = lim^^oo It is known that 26.1848 < 7 < 30.0606. In this 

paper we sharpen these bounds to 27.22685 < 7 < 27.22688. The proof is based 
on recent results of Bender, Gao and Wormald [Elec. J. Combinatorics 9 (2002) 
R43], and on singularity analysis of generating functions. 



1. Introduction 

Let Gn be the number of labelled planar graphs on n vertices. We stress the fact 
that we are counting planar graphs, that is, graphs that can be embedded in 
the plane, without considering a particular embedding. A subadditivity argument 
shows that the following limit exists: 

7= lim 

n— )-oo 

This was first observed by Denise, Vasconcellos and Welsh [5]. Recently, a number 
of authors have pursued the study of random planar graphs [7, 11, 13]. One of the 
problems mentioned in these references is to obtain good bounds for the value of 
the constant 7. 

A lower bound results from the work of Bender, Gao and Wormald [1]. 
They show that, if Bn is the number of 2-connected labelled planar graphs, then 
limn^oo ~ 26.1848; thus 7 is at least this number. On the other hand, 

by means of an encoding scheme for (unlabelled) planar graphs, Bonichon, Gavoille 
and Hanusse [2] have shown that 7 < = 32.1556. Very recently, this upper 

bound has been reduced by Bonichon at al. [3] to = 30.0606. 

The purpose of this paper is to sharpen these bounds. 

Theorem 1,1. If Gn is the number of labelled planar graphs on n vertices, and 
7 = limn^oo {Gn/nl)^^"", then 

27.22685 < 7 < 27.22688. 

The proof is based on singularity analysis of generating functions, as described 
in the forthcoming book by Flajolet and Sedgewick [6]. If Cn is the number of 
connected labelled planar graphs on n vertices, there are simple equations linking 
the generating functions B{x) = C{x) = and G{x) = 

Y), GnX^fnl. The singularities of B{x) were determined in [1], and from this we are 
able to compute the dominant singularities of C(x) and G{x), which are both equal 
to 7“^. Since we do not have explicit analytic expressions for the corresponding 
singularities, in order to determine them accurately we have to rely on numerical 
methods. 




134 



Omer Gimenez and Marc Noy 



In Section 2 we review the preliminaries needed for the proof, and we also 
recall previous work on the enumeration of planar graphs and planar maps. In 
Section 3 we present the proof of our main result. We conclude with some remarks 
and open questions. 

All graphs in this paper are simple and labelled, so we omit the qualifier from 
now on. As a rule, variable x in generating function marks vertices, and variable 
y marks edges. 



2. Preliminaries 

Let Gny Cji and denote, respectively, the number of planar graphs, connected 
planar graphs, and 2-connected planar graphs on n vertices. We introduce the 
following exponential generating functions: 

G{x) = TGn^, G{x)^TGn'^, B{x) = TBn'^. 

nl nl n\ 

n>0 n>0 n>0 

These series are related as follows. 

Lemma 2.1. The series G{x), C{x) and B{x) satisfy the following equations: 
G{x) = exp{C{x)), xC'{x) = x exp {B' {xC' (x ))) , 
where C'{x) = dC{x)/dx and B'{x) = dB{x)/dx. 



Proof The first equation is standard, given the fact that a planar graph is a set 
of connected planar graphs. 

The second equation follows from a standard argument on the decomposition 
of a connected graph into 2-connected components; see, for instance, [9, p. 10]. □ 

Let Bn^q be the number of 2-connected planar graphs with n vertices and q 
edges, and let 

B{x,y) = 

be the corresponding bivariate generating function. Notice that B{x, 1) = B{x). 
Define the series M{x,y) by means of the expression 



M{x,y) = 



1 1 

+ 



1 - 



{1 + Uf{l + Vf 



1 + xy ' 1 + y * {1 + U + V)3 

where U{x^y) and V(x^y) are algebraic functions given by 
U = xy{l + Vf, V = y{l-\-Uf. 

The next result from [1] is essential in what follows. 

Theorem 2.2. We have 

0B{x, y) ^ / l + D{x,y) _ 

dy 2 V 1 + 2 / 

where D — D{x^y) is defined implicitly by Z>(x, 0) — 0 and 



2x‘^D \ l + y J ' 1 + xD 

Moreover, the coefficients of D{x,y) are nonnegative. 



( 1 ) 



( 2 ) 



( 3 ) 




Growth of planar graphs 



135 



Prom the equations above, it is shown in [1] that the radius of convergence of the 
series B{x) = B[x^ 1) is equal to R ^ 0.0381910976694. Moreover, R is given by 
explicit analytic expressions and can be computed to any degree of accuracy. 

Let us comment on the previous equations, which are based on the well- 
developed theory of counting rooted planar maps (see, for instance, [ 8 ]). The al- 
gebraic generating function M corresponds to 3-connected planar maps, and they 
were enumerated by Mullin and Schellenberg [ 12 ]. The next ingredient is Whitney’s 
theorem: a 3-connected planar graph has a unique embedding in the sphere. Hence, 
counting 3-connected planar graphs essentially amounts to counting 3-connected 
planar maps. Finally, the decomposition of a 2-connected graph into 3-connected 
components (see, for instance, [14]) allows one to connect the generating functions 
B and M. This decomposition is encoded in equations (2) and (3) and were found 
by Walsh [15] using the so-called networks. 

If we are interested in the bivariate generating functions C{x^y) and G{x,y), 
where y marks edges, the corresponding equations are 

d f d d 

G{x,y) = exp(C(a:,y)), x—C{x,y) = xe-xp {~B{x—C{x,y),y) 

The reason is that the parameter “number of edges” is additive under taking 
connected components and 2 -connected components. 

Next we turn to some analytical considerations. Let f{x) be a function an- 
alytic at 0 with real non-negative Taylor coefficients and such that /(O) = 0. Let 
'4){u) be its functional inverse, that is, 'ijj{f{x)) = x, and assume that i? > 0 is the 
radius of convergence ot x/j. In order to determine the radius of convergence p of 
/, there are two cases to consider (see Proposition IV. 4 in [ 6 ]): 

: (i) There exists r E (0,i?) (necessarily unique) such that xp'{r) = 0. Then 
p = i’iT)- 

: (ii) We have ip'{u) ^ 0 for all u G (0,il). Then p = sup ip{u). 

0<u<R 

The rationale in (i) is that xp{u) ceaises to be invertible at r since the derivative 
vanishes; hence the inverse function f{x) ceases to be analytic at ^(r). In case (ii) 
there is no obstruction to the inversion of 'ip[u) and the radius of convergence of 
f{x) is as stated. 

Notice that in both cases we have p = s^Pq<u<r first case because 

^ has an absolute maximum at r; the following easy consequences can be used to 
bound p from above and from below. 

Fact 1 .: If h{u) is a function such that xp(u) < h{u) for all u G (0, i?), then 
p < sup h{u). 

0<u<R 

Fact 2.: If A < for some A E and u E (0, fi), then X < p. 



3. Proof of Theorem 1.1 

Prom the definition of 7 , it follows that 7 “^ is the radius of convergence of G{x). 
Since G{x) = exp{G{x)) and exp(x) is an entire function, G{x) and G(x) have the 
same radius of convergence; from now on we concentrate on C{x). 

In order to simplify the notation, we define a new series F{x) as 

F{x)=xG'{x). 




136 



Omer Gimenez and Marc Noy 



The second equation in Lemma 2.1 now takes the simpler form 

F{x) = xexp{B\F{x))). (4) 

Notice that the radius of convergence of B'{x) and T(x), are the same, respectively, 
as those of B(x) and C{x). 

We define 

'ip{u) = ue~^ 

so that F(x) and 'ip{u) are functional inverses. The radius of convergence of -0 is the 
same as that of B'{x), namely R. Our task is to estimate the radius of convergence 
p == 7 “^ of F{x) from the fact that and F are inverses to each other. 

Lower bound. Let B(^k^{x) = Bq + B\X + • • • BkX^ /k\ be the truncation of the 
series B{x) at order k. Since the B^ are non-negative. 



Hence, if we define 

ipk{u) = uexp{-B[f.^{u)), 

we have 

'ip(u) < ipki^) for all u G (0, R). 

Since B{j^^(u) is a polynomial, the maximum value of i^{u) in the interval [0,i?] is 
easily computed: if the unique root Tk of 'ip'^irk) = 0 is less than R^ the maximum 
is 'il^k{xk)'^ otherwise it is 'ipk{R)' For s-fi values of k we have been able to check, it 
turns out that Tk > i?, hence the second alternative applies. 

We have computed the values of Bn up to n = 25 (they are also computed 
in [ 1 ]). We have ^ 24 (^) ~ 0.03672844872, and from Fact 1 it follows that p < 
0.03672844872, hence 7 = > 27.22685. 

Upper bound. Given u in (0,i?), we show next how to determine a number j3u 
such that /3u > B'{u). If we set A = uexp(— /3^), then 

A < uexp{—B'{u)) = 'ijjiu) 

and we can apply Fact 2 to bound p from below. 

We start by rewriting equation (2) as 

B{x,y)^ [ P{x,t)dt, 

Jo 



where 



P{x,y) = 



1 + D{x,y) 



- 1 



2 \ 1 + y 

Note that, since H is a power series with non-negative coefficients, so its derivatives 
and the derivatives of P{x^y) = dB{x^y)/dy. Consequently we obtain that 



d^B{x, y) 
dx^ - ’ 



d^P{x,y) 

dy^ 



for all fc. 



for positive values of x and y. 

Next we show how to compute values Bi and B 2 such that 

B\ < B{u) — f P{u,t)dt, 

Jo 

B 2 > B{u + e) = / P{u -h 6, ^)d^, 

Jo 




Growth of planar graphs 



137 



where e > 0 is such that u + e < R. 

We compute B\ and B 2 using numerical integration methods. We apply sev- 
eral Newton-Cotes quadratures as described, for instance, in [10]. These methods 
relate the error term to the evaluation of a certain derivative of the integrand, 
and by the positiveness of d^Pfdy^ for positive values of x and we know with 
certainty the sign of the error term; this is essential for us, as we are looking both 
for upper and lower bounds for the integrals. 

We obtain the value Bi by applying, for example, the repeated 3-point open 
rule. The error term B{u) — B\ is the positive number 28/i^/^^^(^)/90, where 
f{y) — P{u,y)^ h is the distance between successive evaluations of the integrand 
and ^ is in (0,1). To obtain B 2 we apply the repeated 5-point closed rule. The 
error term B{u + e) - B 2 is now the negative number -8/i^/^®^(^)/945, where 

f{y) = P{u + e,y). 

These methods require the evaluation of P(u, t) and P{u + e, t) for several 
values of t. This is not a problem, since D is a function defined implicitly by 
equations (1) and (3), and we can apply classical methods to obtain evaluations 
of D up to any precision required. 

Finally, we set (3u = {B 2 - Bi)/e. This yields an upper bound for B\u), since 
the positiveness of d^Bjdx^ implies that 

B'(„) < ^ 

We have performed these computations with the help of Maple using 25 significant 
digits. For u — 0.038191096, e = 1.6 • 10“^ and h = 1/30000, for both integration 
methods. Using this data we have obtained that A > 0.036728410432. By applying 
Fact 2 we obtain 7 — < 27.22688. 



4. Concluding remarks 

We have determined 7 with an accuracy of four decimal digits. Next we comment 
on the problems faced in order to increase the number of correct digits. For the 
lower bound one needs to compute exact values of Bn for larger values of n, and 
this is indeed possible. However, we have worked with larger truncations P</e>(x) 
of the series B{x) and we have observed only a marginal improvement on the lower 
bound. 

To improve the upper bound, we need to estimate B{u) and B{u e) for u 
close enough to R, But the closer u is to 72, the smaller e becomes, so we also need 
to obtain sharper estimations for B{u) and B{u-\-e) in order to really improve the 
upper bound. On the whole this would require to change the integration method 
we have used, or to evaluate P(it, y) and P{u -h e, y) at even more points; with our 
present algorithms this seems computationally unfeasible. The main open problem 
here is whether 7 can be determined analytically. 

A second, and more important, remark is the following. One of our aims when 
performing the computations for the upper bound was to obtain r < R such that 
4>'{t) = 0 , that is, to show that F{x) = xC'{x) corresponds to case (i) of section 
2. Then it would follow by singularity analysis that 

Cn - 




138 



Omer Gimenez and Marc Noy 



for some constant K. On the other hand, if r does not exist then we would obtain, 
from the results in [ 1 ] and singularity analysis on that 

Cn ~ Vn!, 

for some constant K' . Thus we have a dichotomy for the asymptotic behavior of 
Cn] deciding which is the true situation remains an open question. 

With respect to the existence of r, notice that = 0 is equivalent to 

= 1. If r G (0, i?) exists, then B"{r) = > R~^ = 26.18 But our 

computations for u G {0,R — 10~^) give B^'{u) < 2.09. . . for Hence, if r exists, 
then it must be extremely close to the singularity R. 

Finally, we remark that, if gn is the number of unlabelled planar graphs on n 
vertices, McDiarmid, Steger and Welsh [11] have shown that = lim^^oo 
exists and that 7 < 7 ^. Thus our result also gives 27.2268 as a new lower bound 
for 7 ^. 

5. Acknowledgements 

We are grateful to Philippe Flajolet for useful comments on the paper, in particu- 
lar, for pointing out the dichotomy mentioned in the conclusions. Thanks also to 
the anonymous referee for his criticisms. 

References 

[ 1 ] E. A. Bender, Z. Gao, N. C. Wormald, The number of 2-connected labelled planar 
graphs. Elec. J. Combinatorics 9 (2002), #43. 

[2] N. Bonichon, C. Gavoille, N. Hanusse, An information-theoretic upper bound of 
planar graphs using triangulations, Springer Lecture Notes in Computer Science 
2607 (2003), 499-510. 

[3] N. Bonichon, C. Gavoille, N. Hanusse, D. Poulalhon, G. Schaeffer, Planar Graphs, 
via Well-Orderly Maps and Trees (preprint). 

[4] M. Bodirsky, C. Gropl and M. Kang, Generating Labeled Planar Graphs Uniformly 
at Random, in Proc. of ICALP 2003, LNCS 2719, 1095-1107. 

[51 A. Denise, M. Vasconcellos, D. J. A. Welsh, The random planar graph, Conqr. Nu- 
mer. 113 (1996), 61-79. 

[6] P. Flajolet, R. Sedgewick, Analytic Combinatorics (book in preparation), preliminary 
version available at http : / /algo . inria . f r/f laj olet/ 

[7] S. Gerke, C. McDiarmid, On the Number of Edges in Random Planar Graphs, Comb. 
Prob. and Computing 13 (2004), 165-183. 

[8] I. P. Goulden, D. M. Jackson, Combinatorial enumeration, John Wiley & Sons, Inc., 
New York (1983). 

[9] F. Harary, E. Palmer, Graphical Enumeration, Academic Press, New York-London 
(1973). 

[10] F. B. Hildebrand, Introduction to Numerical Analysis, Dover, New York (1987). 

[11] C. McDiarmid, A. Steger, D. Welsh, Random Planar Graphs, preprint (2003). 

[12] R. C: Muffin, P. J. Schellenberg, The enumeration of c-nets via quadrangulations, J. 
Combin. Theory 4 (1968), 259-276. 

[13] D. Osthus, H. J. Promel, A. Taraz, On random planar graphs, the number of planar 
graphs and their triangulations, J. Combin. Theory Ser. B 88 (2003), 119-134. 

[141 W. T. Tutte, Graph theory (reprint of the 1984 original), Cambridge University 
Press, Cambridge (2001). 




Growth of planar graphs 



139 



[15] T. R. S. Walsh, Counting labelled three-connected and homeomorphically irreducible 
two-connected graphs, J. Comhin. Theory Ser. B 32 (1982), 1-11. 

Omer Gimenez 

Universitat Politecnica de Catalunya 
omer.gimenez@upc.es 

Marc Noy 

Universitat Politecnica de Catalunya 
marc.noy@upc.es 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



The Number of Spanning Trees in P4-Reducible 
Graphs 

Stavros D. Nikolopoulos and Charis Papadopoulos 

ABSTRACT: We present an efficient algorithm for determining the number 
of spanning trees in the class of P 4 -reducible graphs, which are perfect graphs and 
generalize both the well-known class of cographs and the class of quasi-threshold 
graphs. In particular, for a P 4 -reducible graph G on n vertices and m edges, our 
algorithm computes the number of spanning trees of G in 0(n-\-m) time and space, 
where the complexity of arithmetic operations is measured under the uniform cost 
criterion. The algorithm takes advantage of the modular decomposition tree of the 
input graph which it gradually shrinks in a systematic fashion until it becomes a 
single vertex while at the same time appropriately updating certain parameters 
whose product gives the desired number of spanning trees. The correctness of 
the algorithm is established through the Kirchhoff matrix tree theorem, and is 
also based on structural and algorithmic properties of the graphs with few P 4 S. 
Our results generalize previous results and extend the family of graphs admitting 
linear-time algorithms for the number of their spanning trees. 



1. Introduction 

The purpose of this paper is to study the problem of finding the number of spanning 
trees in the class of P 4 -reducible graphs, which are defined as the graphs for which 
no vertex belongs to more than one induced P 4 [3]. We take advantage of the 
modular decomposition tree of the class of P 4 -reducible graphs in order to obtain 
an efficient solution of the above mentioned problem. 

The modular decomposition of a graph G is a tree denoted as T{G); the leaves 
of T{G) are the vertices of G and the set of leaves associated with the subtree 
rooted at an internal node, induce a strong module of G. Thus, the modular 
decomposition tree T{G) represents all the strong modules of G. An internal node 
is labeled by either 0 (or p for parallel module), 1 (or s for series module), or 2 
(or n for neighborhood module). The module corresponding to a 0-node induces 
a disconnected subgraph of G, that of a 1-node induces a connected subgraph of 
G whose complement is disconnected, and that of a 2-node induces a connected 
subgraph of G whose complement is also connected. A linear-time algorithm for 
the construction of the tree T{G) can be found in [4]. 

Let u be an internal node of T{G). We denote by M{u) the module corre- 
sponding to u which consists of the set of vertices of G in the subtree of T{G) 
rooted at node u. Let ch{u) = {ui,U 2 , . . . ,Up} be the set of the children of u 
in T{G). Let G{u) be the representative graph of the module M{u), which is de- 
fined as follows: V{G{u)) = {ui,U2, . . . ,Up}, and E{G{u)) = {{ui,Uj) | {vi,Vj) G 
E{G),Vi G M{ui) and Vj G M{uj)}. 




142 



Stavros D. Nikolopoulos and Charis Papadopoulos 



Let G be a graph, T(G) its modular decomposition tree and u an internal 
2-node of T{G). The graph G is a P4-reducible graph iff for every u of T{G), G{u) 
is either a P4, or a bull graph. Moreover, the vertices of the P4, or the P4 of the 
bull graph are leaf vertices in T(G) (Giakoumakis and Vanherpe [2]). 



2. The Algorithm 



Let G be a P4-reducible graph on n vertices and m edges. In order to compute 
the number of spanning trees of the graph G, we make use of the Kirchhoff matrix 
tree theorem [1]; that is, we delete an arbitrary vertex w G V{G) and all the edges 
incident on vertex w. 



Next, we set s{v) <— d{v) for every vertex v of the graph G — w^ where d{v) is 
the degree of the vertex u in G; we call these labels of the vertices their s -labels. The 
algorithm works by contracting in a systematic fashion the contractible nodes of 
the modular decomposition tree of the graph G—w and by assigning to the leaf that 
is produced the highest-index vertex of G which is a child of the contracted node. 
The contractions are done by means of two functions, namely, Replace-Parallel- 
Series{) and Replace.^Neighborhood{) , which also update the s-labels of the chil- 
dren/vertices of the contractible node. We note that the s-labels are assumed to 
be global variables in our algorithm. The algorithm Spanning-Trees-Number and 
the two functions are given in detail below. 



Let u be a 0(l)-node such that ch(u) = {vi,V 2 ^ . . . , Vp} forms a contracted block of 
T(G); that is, each Vi G ch(t^) is a leaf. The function Replace_Parallel-Series() works 
as follows: 



o 

o 

o 



o 



if n is a 1-node then increase the 5-labels 5(i;i), 5(t;2 ),..., 5(1;^) by 1; 

1 

compute the parameter e(u) := 7 — ^ — r; 
update the 5-labels s{vp-i) and s{vp) as follows: 

5 (vp_i) := 5 (i;p_i)-(l-h 5 (^p)-e(tA)), and s{vp) := s{vp) / {l-\- s{vp) ■ e{u)); 
if is a 1-node then decrease s{vp) by 1; 

make vertex Vp a child of p{u) and delete the vertices vi^V2, , Vp-i and 
node u from T(G). 



Let be a 2-node such that ch(u) = {v\^V 2 , . . . ,Vp} forms a contracted block 
of T(G). If G{u) is a P4 then |ch(ix)| = 4, or else if G{u) is a bull graph, then 
|ch(xx)| = 5 (see [2]). The function Replace_Neighborhood() works as follows: 

o compute the endpoints x;i, V 4 ^ and the midpoints t’2, '^^3 of the V\V 2 V^V 4 ^ 
of the graph induced by ch('u), and also the values of dg and dk^ where 
ds = d{vi) = d(v^) and d^ = d{v 2 ) = d{vs); 
o if |ch(u)| = 4, then update the 5-labels of the vertices of ch(u), as follows: 



s{vi) = ds (1) 

^ 2 

s{v2) — dk-\- — a d a a ( 2 ) 

CLs 

sivs) = ^-{dk- ( 3 ) 

7 ds 

s(v4) = 7 (4) 




Spanning Trees in P4-reducible Graphs 



143 



where 

7 = 2-(ds + dfc + l); a a a a (5) 

o if |ch(ii)| = 5, then update the s-labels of the vertices of ch(ii), as follows: 



s(t;i) 


— ds 


(6) 


s(v2) 


= 4 + V^ 

ds 


(7) 


CO 


ds s{v5)-{dk 2 _ _ 

7 s(r;5) + 


(8) 


S(U4) 


= 7 


(9) 


S{V5) 


= »<•’=)+ 7 1 


(10) 


where 






7 = 2- 


{ds + 4 + 1) - (4 - 1) • ^ 


(11) 



o make vertex child of node p{u) and delete the vertices of ch{u) and 
node u from the tree T(G). 



We next describe an algorithm which computes the number of spanning trees r(G) 
of a P 4 -reducible graph G; it works as follows: first it computes the degree d{vi) of 
each vertex Vi G V{G) and assigns s{vi) := d{vi)^ 1 < i <n. Then, it computes the 
graph G := G — Vn^ where Vn G V(G), and constructs its modular decomposition 
tree T := T{G)\ the resulting graph G has n — 1 vertices. Recall that, vi is a leaf 
of T(G), 1 < z < n - 1. Next, it repeatedly applies the functions Replace_Parallel- 
Series(iz, T) to each contracted 0(l)-node and Replace_Neighborhood(u,T) to 
each contracted 2-node by a bottom-up traversal of T. With this process, it 
finds the s-labels s(r;i), s(i; 2 ), . . . , s(t'n-i) of the vertices of T. Finally, it computes 
the number of spanning trees r(G) := 

Theorem 2.1. The number of spanning trees of a P^-reducihle graph G on n vertices 
and m edges can he computed in 0{n-\- m) time and space. 

References 

[1] N. Biggs, Algebraic Graph Theory^ Cambridge University Press, London, 1974. 

[2] V. Giakoumakis and J-M. Vanherpe, On extended P 4 -reducible and P 4 -sparse graphs, 
Theor. Comput. Sci. 180 (1997) 269-286. 

[3] B. Jamison and S. Olariu, P 4 -reducible graphs a class of tree representable graphs. 
Studies Appl Math. 81 (1989) 79-87. 

[4] R.M. McConnell and J. Spinrad, Modular decomposition and transitive orientation. 
Discrete Math. 201 (1999) 189-241. 

Stavros D. Nikolopoulos and Charis Papadopoulos 

Dept, of Computer Science, University of loannina, GR-45110 loannina, Greece 
stavros@cs.uoi.gr 




Part III 

Analysis of Algorithms 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



On the Stationary Search Cost for the 
Move-to-Root Rule with Random Weights 

Javiera Barrera and Christian Paroissin 



ABSTRACT: Consider a binary search tree containing n items updated 

according to the move-to-root rule as defined first by Allen and Munro [1] . Assume 
that request probabilities are themselves random. Formula for the expectation is 
derived from classical result and is compared to the case of a list updated according 
to the move-to-front rule [2] . The case of Gamma request is then studied. 



1. Introduction and model 



Consider a binary search tree containing n items updated as follows: at each unit 
of discrete time, an item is requested independently of the previous requests and is 
moved to the root of the binary tree. This way to update a list is called the move- 
to-root rule and was suggested by Allen and Munro [1]. The aim of this heuristic 
is to keep a binary search tree in near-optimal form. The associated Markov chain 
was studied by Dobrow and Fill [3]. One could be interested in the search cost 
of an item in the binary tree at a given time. The search cost is simply defined 
as the number of ancestors of the item requested (or equivalently as the depth of 
the item in the tree minus one). Let a; = be a sequence of independent 

strictly positive random variables. For any i ^ 1, Wi represents the weight of the 
item i. Let us consider the n first files. One can construct a vector of requested 
probabilities p„ = (pi, • • • ,Pn) from weights as follows: 



Vt e {1, ...,n} , 



Pi = 



Wi 

Wr,. 



n 

where Wn = Wi . 

i=l 



Such construction of random request probabilities was first made by Kingman [4] 
and is natural to model the unknown in other areas like Bayesian statistics. 



2. Expectation of the stationary search cost 

Let us denote by 5^ the stationary search cost. Using the same technique as [2], 
one can get the following theorem from classical result: 

Theorem 2.1. For a sequence uj of independent random weights, we have: 




where is the Laplace transform of Wi, for all i ^ 1. 




148 



Javiera Barrera and Christian Paroissin 



This expected stationary search cost can be compared to the one when the 
data structure used is a list. Let us denote by S!^ the stationary search cost of a 
list updated with the move-to- front rule (see [2]). From equation (1), since (/) is a 
decreasing function of the time, one gets (see corollary 1 in [2]): 




This result is not surprising since quickest retrieving data is one of the advantage 
of binary search tree over list. One could prove the following inequalities: 

Proposition 2.2. For a sequence uo of independent random weights, we have: 

E[S„]^ H{pn) ^2\ogn , (2) 

where H{pn) is the Shannon entropy. 

This bound is not a surprise since log n is the order of the height of a random 
binary search tree with n nodes. Moreover the expectation reaches the equality 
when one consider deterministic and equal weights. 



3. Gamma request probabilities 

Let us consider a sequence of independent random variables having the Gamma 
distribution with parameters Oi and 1. Thus one gets: 

2 ^ 

+ • • • + + 1 ) ’ 

where An = ai~\ hUn- In the case of iid random variables (all afs equals to a), 

one can compute an asymptotic equivalent: E[5n] ~ 21ogn. Remark that it does 
not depend on a. Moreover the asymptotic equivalent equals to the expectation in 
the case of deterministic and equal weights, which appears as the “worst case” . 



References 

B. Allen and I. Munro. Self-orffanizinff binary search trees. J. Assoc. Comput. Mach., 
25:526-535, 1978. 

[2] J. Barrera and C. Paroissin. On the distribution of the stationary search cost for the 
move-to-front with random weights. J. Appl. Prob., 41(1): 250-262, 2004. 

[3] R.P. Dobrow and J.A. Fill. On the Markov chain for the move-to- root rule for binary 
search trees. Ann. Appl. Probab., 5(1): 1-19, 1995. 

[4] J.F.C. Kingman. Random discrete distributions. J. R. Statist. Soc., B37:l-22, 1975. 

Javiera Barrera 

CMM, Univ. de Chile, Casilla 170-3, Correo 7, Santiago, Chile 
j barrer a@dim . uchile . cl 

Christian Paroissin 

MODAL’X, Univ. Paris X, 200 av. de la Republique, 92001 Nanterre Cedex, Prance 
cparoiss@u-parislO.fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Average- Case Analysis for the Probabilistic Bin 
Packing Problem 

Monia Bellalouna, Salma Souissi, and Bernard Ycart 



ABSTRACT: In the Probabilistic Bin Packing Problem (PBPP), some items 
are randomly deleted after having been placed into bins. The problem is to rear- 
range the remaining items, using the a priori solution. The initial arrangement 
being done with the Next Fit Decreasing heuristic, we consider two procedures. 
In the first one, the NF algorithm is applied to the new list. In the second one, 
successive groups of bins are optimally rearranged. In both cases, we prove a law 
of large numbers and a central limit theorem for the number of occupied bins as 
the initial number of items tends to infinity. 



1. Introduction 

Bin Packing is a classical NP-hard problem of optimization [10]: given items of 
sizes (xi, . . . , Xn), all smaller than 1, one must pack them into bins of size 1, so as 
to minimize the total number of non-empty bins. Many approximation heuristics 
have been proposed and studied: see Coffman et al. [5] for a survey. We shall 
focus here on the Next Fit Decreasing (NFD) heuristic [ij. Firstly the items are 
ranked in decreasing order (this can be done in 0{nlog{n)) time). Then they are 
put into bins according to the the Next Fit algorithm: one bin is open at a time; 
when a new item has to be placed, either it fits in the open bin or it does not, in 
which case the current box is closed and a new one is opened (it takes 0{n) time 
to place the items once sorted). In the average case analysis, the item sizes are 
random variables, and so is the number of non-empty bins. Its distribution has 
been thoroughly studied, in particular by Csirik et al. [7, 6], Hofri and Kamhi [13] 
and Rhee [17, 18] (see also section 5.2 of [11] or section 10.3 of [12]). 

The idea of so called Probabilistic Combinatorial Optimisation Problems 
comes from Jaillet [14, 15] who introduced it for the Traveling Salesman Prob- 
lem (see also [3, 4]). The Probabilistic Bin Packing Problem (PBPP) was first 
studied in [2]. The idea is the following. Assume that a list of n items has been 
given, and an a priori solution (exact or approximate) hats been found for the BPP. 
Suppose now that some items randomly disappear from the list. Can the knowl- 
edge of the a priori solution for the full list be used to construct a solution for the 
reduced one? Can this be done efficiently without reopening simultaneously too 
many bins of the a priori solution? 

The aim of this paper is to propose an average case analysis of the PBPP, 
when the a priori solution is obtained through the NFD heuristic. We shall deal 
with two sources of randomness. Firstly, the initial sizes of the items are inde- 
pendent and identically distributed random variables (i.i.d.r.v.’s) {Xi, . . . ,Xn). 
Secondly, once they have been sorted in decreasing order and placed into bins by 
the NF algorithm, a random binary decision is taken: for each i = 1, . . . ,n the 
item number i remains or disappears. To formalize this, we consider a n-tuple 




150 



Monia Bellalouna, Salma Souissi, and Bernard Ycart 



(C/i, . . . , Un) of i.i.d.r.v.’s, uniformly distributed on [0, 1]. The two random vectors 
Q'Hd (C/i) 2 z=i,...,n are independent. The probability for an item to stay 
in the list may depend on its size: we denote by p{x) the probability for an item of 
size X to stay in the list. It will be convenient to view disappearing items as objects 
whose size has become null. Thus the new list of sizes is (Yi, . . . ,Y^), where for 
i = 1, . . . , n: 

Yi = , 

denoting by the indicator of an event A. 

If the NFD heuristic has been used for the a priori solution, an obvious proce- 
dure immediately comes to mind. Since the initial items were ranked in decreasing 
order, so are the remaining ones, and it is fast and natural to apply again from 
scratch the NF algorithm to the list of remaining items. The average case analysis 
of this procedure is proposed in section 2. The total number of bins will be proved 
to satisfy a law of large numbers and a central limit theorem, and an explicit 
expression for the asymptotic mean and variance will be given (theorem 2.3). 

However, it is not in the spirit of Probabilistic Combinatorial Problems not 
to use the a priori solution once the items have been randomly deleted. Therefore, 
we shall study another heuristic. Suppose the a priori NFD solution has been 
computed, its bins being numbered by order of opening. Once the items have been 
randomly deleted, vacancy is left in some of the bins. The Group Rearrangement 
(GR) procedure depends on a fixed integer m which is the number of bins to be 
opened simultaneously. Here is the algorithm. 

1. Open the bins of the a priori solution by groups of m, one group at a time: 
first bins with numbers 1 to m, then m + 1 to 2m, and so on. . . 

2. For each group of m bins, rearrange the remaining items in an optimal 
way. 

3. Eliminate those bins that have been emptied. 

The average case analysis of the GR procedure is treated in section 3. Again, a law 
of large numbers and a central limit theorem for the total number of non-empty 
bins will be proved (theorem 3.1). 

Of course, the GR procedure is neither faster, nor better on average than the 
NFD heuristic: both run in linear time, and the asymptotic mean number of bins 
is larger for the former than for the latter. However, numerical evidence shows that 
the difference is small. We are not able at this point to propose a similar study 
for the optimal a priori solution. But we consider our NFD results as a reason to 
believe that local rearrangements inside small sized groups of bins, such as in the 
GR procedure, may bring a fast and relatively good solution to the PBPP, when 
starting from an a priori solution, be it optimal or not. 



2. PBPP by the NFD heuristic 

In this section, we study the asymptotics of the total number of bins filled by the 
NFD heuristic, for items of random sizes, once some of them have been randomly 
deleted. 

Two independent sequences of i.i.d.r.v.’s are given: (X^)i>i and The 

Xi's are the sizes of the original items, and the C/^’s are the random variables that 
decide of their deletion. The probability distribution function of the A^’s is denoted 
by F and the [/^’s have uniform distribution on [0, 1]. A measurable function p, 
from [0, 1] into itself is given. If x is an item size, p{x) is its probability to remain 
in the new list. As already pointed out, it is convenient to consider deleted objects 




Probabilistic Bin Packing Problem 



151 



as items of size 0. Thus the new list of item sizes after random deletions becomes 
(^z)z>i 5 where for all i > 1: 



Yi - • 

Notice that the F^’s are still i.i.d.r.v.’s. Denote by An the number of bins used 
by the NFD algorithm to arrange the n items of sizes Yi, . . . , The asymptotic 
study of An requires very little adaptation of the classical proof for uniformly 
distributed sizes, developped by Csirik et al. [6] (see Hofri [12] section 10.3.1, 
p.543 ff.). We shall review below the main arguments. We are aware of the more 
precise approach of Rhee [17], who gives a much better bounding for An than that 
of lemma 2.2. The reason why we chose Csirik et al.’s truncation technique is that 
it can also be used for the GR procedure, to be treated in section 3. 

The first observation is that the number of bins depends more on the types 
of the items than on their actual sizes. 



Definition 2.1. For k > 1, an item is said to he of type k if its size x is such that 



1 

k 1 



< X < 



Thus a bin can accommodate exactly k objects of type k. To account for 
deletions, we shall agree that an item of size 0 has type 0. With our probabilistic 
hypotheses, the item types are i.i.d.r.v.’s with values in N. We shall denote by 
p = {pk)keN their distribution. For fc > 1, the probability for an item to be of type 
k is 

Pk= p{x) dF{x) , 
whereas its probability to be of type 0 (deletion) is 



Po = l- ^Pk = 1 - / p{^) dF{x) . 

do 

As a particular case, if the original item sizes are uniformly distributed on [0, 1] 
and the function p is constant, one gets po = 1 — p and for k > 1: 

^ k{k + 1) ■ 

The results that follow only depend on the distribution p. 

Since the items are examined in decreasing order of size, all items of type 1 
are treated first, and placed alone in as many bins. Then come type 2 items. The 
first of them possibly fits in the same bin as the last type 1 item, the others are 
placed 2 by 2 into new bins, and so on. . . It is intuitively clear that, apart from a 
few “frontier” bins that may contain items of different types, most bins will host 
a fixed number of items of the same type. Lemma 2.2 below gives bounds on the 
number of used bins, in terms of two functions of the item types. 




152 



Monia Bellalouna, Salma Souissi, and Bernard Ycart 



Lemma 2.2. Let r > 1 be an integer. Define the two functions (j)i and <1)2, from 
[0, 1] into itself by: 



M^) 



\o if 0 <x<^ 



4>2{x) 



i if 1 ^ <x<^ ,k = l,...,r-l , 
i if 0<x< I , 

0 if X = 0 . 



Let 2 / 1 , • • • ,2/n be n (possibly null) item sizes. Let an be the total number of bins 
required to arrange those items using the NFD heuristic. Then: 



n 

^4>i{yi) -{r-l) <an < 

i=l 



n 

i=l 



Proof. For k > 1, let Uk be the number of type k items. The number of bins they 
will occupy is at least where [*J denotes the integer part. Hence the lower 

bound, neglecting items of size < r. For the upper bound, all items of type k < r-l 
can be accommodated in at most + 1 bins, and all items of type fc > r in at 
most + 1 bins. □ 

When the input sizes are random, lemma 2.2 provides bounds on An in terms of 
two sums of random variables: 

n n 

~ + r . (1) 

i=l 2=1 

In (1), both Si^n = ^ ^iid S 2 ,n = S are sums of bounded i.i.d.r.v.’s. 

Their asymptotic behavior (exponential tail inequalities, law of large numbers, 
central limit theorem) is described by basic results of probability theory (see [8, 9] 
or [16] as general references). These can be passed to An, through a careful choice 
of the free parameter r. 

For r > 1 denote by Qr the tail probability for the distribution of types: 

CO 1 — 1 

Qr = Y^Pk = l~YlPk ■ 

k=r k=0 

We will assume that Qr decreases at least as fast as some negative power of r: there 
exist two positive constants c and a such that for all r > 1, 

qr < cr^‘^ . ( 2 ) 

This is actually an assumption on both the behavior of the size distribution func- 
tion F close to 0, and the function p. 



Theorem 2.3. Under the previous hypotheses, denote by p and the following 
asymptotic mean and variance: 



E 

fc=l 



Pk 

k 



( 3 ) 



CO 




(4) 




Probabilistic Bin Packing Problem 



153 



Then the following results hold for . 

1. Exponential tail inequality: for all x > 0 and n> 1, 

P[ — n/i| > x\/n| ] < exp 2(x — — 2\/cn"~2(2+^)^^ . (5) 

2. Law of large numbers: 



lim — = p a.s. 

n-^oo Tl 



3. Central limit theorem: 



lim ] 

n-^oo 



/na^ 



-.{An -nfi) <x 



^x), 



( 6 ) 



(7) 



where $ denotes the standard Gaussian probability distribution function. 

For the particular case of item sizes uniformly distributed on [0, 1] and a 
constant value of p, the asymptotic mean and variance can be expressed in terms 
of Riemann’s Zeta function: 



- 

CW = E3: 






One gets: 
and: 



M = P(C(2) - 1) ^ 0.645p , 

= p(C(3) - C(2) + 1) - p" (C(2) - 1)" ^ 0.557 p - 0.416 p^ . 



Proof. For j = 1, 2, we shall denote by pj the expectation of (pjiYi), and by ct? its 



variance: 



Ml 



r— 1 



r — 1 

E 



k=l 



Pk 

k 



1 — 1 



M2 



2 Pk 2 2 

A:=l k=l 

Obviously as r tends to infinity, pi and p 2 tend to //, whereas af and tend to 
From (1), the following bounds on the PDF of are easily deduced. 

P[S' 2 ,n - rip 2 <x-^r-\- n^] < F[An - np < x] 

(S) 

< P[S'i,n - npi <x-r - n^] . 

As sums of bounded i.i.d.r.v.’s, both S\^n and 82,71 satisfy classical exponential tail 
inequalities, such as Hoeffding’s [16]: 

— '^Tj\ ^ z\/n] < . 

Both also satisfy the central limit theorem: 



E 

k=l 
r— 1 

E 



k r 



Pk Qr 2 

J,2 + ^2 1^2 



lim I 

n— KX) 



i/s' 






To derive from (8) the corresponding results for An , one needs to let the parameter 
r tend to infinity, as a function of n. Recall from (2) that Qr < cr~^. For all n > 1, 
we set r = r{n) as follows: 

r{n) = + 1 . (9) 




154 



Monia Bellalouna, Salma Souissi, and Bernard Ycart 



One readily checks that: 

Qr(n) . /- _1_ 

n—^ < \/cn^+^ . 
r[n) 

Prom there, (5) easily follows from Hoeffding’s inequality applied to and 52,n- 
To derive the strong law of large numbers from an exponential inequality such 
as (5) is an easy application of the Borel-Cantelli lemma. For the central limit 
theorem (7), one just has to divide variables by y/n in (8), and let n tend to 

infinity: with our choice of r(n), both and tend to 0. □ 



3. The Group Rearrangement procedure 

We now turn to the GR procedure, in which bins are taken by groups of m, each 
group being optimally rearranged after deletions. As we saw before, in the a priori 
solution most bins contain items of the same type. Therefore, most groups of bins 
will be homogeneous in the sense that each bin of the group contains exactly k 
items of type fc; let us call “fc-group” such a group of m bins. We need to restrict 
slightly our assumption on the function p: we assume now that it is constant for 
objects of the same type, and denote by pk the probability for an item of type k 
to remain in the list. 

Consider a fc-group. The number of remaining items in all its m bins, has 
binomial distribution with parameters mk and pk> Let us denote by 7Tk,m = 
('7Tfc,m(0)z=o,...,m the probability distribution of the number of remaining bins, 
once the fc-group has been rearranged: 

(1 - for i = 0 , 

The expectation and the variance of iTk^m will be denoted by and Vk,m re- 
spectively. 

m m 

^k,m — ^ '^k,m{i) and Vk^m ~ ^ ^ ^ '^k,m{'^) ~ ^k,m ' 

i=0 i=0 

Let pI be the initial proportion of type k items: for A: > 1, 

We shall make the same assumption on the tail of p* as we did for p: there exist 
two positive constants c and a such that for all r > 1, 

oo 

q*r = . ( 10 ) 

k=r 

Let Bn^m be the number of remaining bins after the GR procedure. The asymp- 
totics of Bn is described in the following result. 





Probabilistic Bin Packing Problem 



155 



Theorem 3.1. Under the previous hypotheses, denote by iXm o.nd the following 
asymptotic mean and variance. 



* 



km ’ 

k=l 

00/ p2 \ 

2 \ ^ * / '^k,m . ^k,m ] 



k=l 



km 



- kL • 



Then the following results hold for Bn^rn^ 
1. Law of large numbers: 

Bn 



n -^00 n 

2. Central limit theorem: 



T ^n.m 

lim = /am CL-S. 



lim P 

n— >00 









-{Bn^m '^km) ^ ^ 



$(x) . 



(11) 

(12) 



(13) 

(14) 



Essentially, Bn^m behaves asymptotically as a sum of n i.i.d.r.v.’s, each having 
expectation firn and variance it can be viewed as the sum of the individual 
contributions of the original n items to the final packing. Indeed, for n large, a 
typical item belongs to a fc-group with probability p^. The contribution of km such 
items (one fc-group) to has expectation ek^m and variance Vk,m] hence each 
contribution should have expectation and variance Thus the expected 

squared contribution of an object of type fc should be So can be 

seen as the variance for the contribution of a typical item to 5n,m- 

Clearly, as m increases, the space wasted by the GR procedure compared 
to the NFD heuristic diminishes. In particular, the asymptotic expectation prn 
defined by (11) tends to p. One may wonder what is the difference for small values 
of m. To get a partial answer, we computed numerically — /i, for m = 2 . . . , 5, 
in the particular case where the item sizes are uniformly distributed on [0, 1] and 
the probability p is a constant. Figure 1 shows a plot of prn — p as a function of p. 
It turns out that the difference between the global algorithm (NFD) and the local 
one (GR) is relatively small, even for m = 2. In order to understand why, let us fix 
m and and look at the asymptotic behavior of ek,m as fc increases. The law of 
large numbers implies that efc,m converges to i for all values of pk in the interval 
]^, ^], for i between 1 and m. In other terms, as fc increases, efc,m approaches 
[mpk] + 1. So Pm is actually close to the following sum : 



Pm 



k=l 



[mpk\ + 1 
km 



to be compared with 



00 



k=i 



fc ' 



This also accounts for the modes in pm — p, plotted as a function of p (figure 1). 



Proof. Controlling Bn m^J sums of independent random variables is not as straight- 
forward as for An, though a similar truncation technique will be used. We shall 
first describe the lower bound, then the upper bound. 




156 



Monia Bellalouna, Salma Souissi, and Bernard Ycart 




Figure 1 . Asymptotic difference between the NFD heuristic and 
the GR procedure, for m = 2, . . . , 5. 



To get a lower bound, we restrict ourselves to /c-groups, with k < r: B^^rn is 
certainly larger than the number of remaining bins after rearrangement of those 
fc-groups, and suppression of all other groups. Denote by Nk the number of items 
of type k in the original list: the distribution of Nr is binomial with parameters n 
and pI- If there are Nk items of type fc, then the number of fc-groups is certainly 
larger than 

For all k > 1, consider independent sequences of i.i.d.r.v.’s all indepen- 

dent from the X^’s, where has distribution Define Si^n as the sum 

r-l Gk 
k=l 1 = 1 

The previous reasoning shows that Si^n is smaller than in the stochastic 

ordering sense: for all x, 



IP’l-S'i.n < a:] > < x] . 

As in the proof of theorem 2.3, we need to control the difference between nfijn and 
£[51^^]. By Wald’s theorem, one has 






1 — 1 



£[Gfc] ek^m 

k=l 



By definition of one has 



E[Gfc] > - 
m 






m 




Probabilistic Bin Packing Problem 



157 



Remarking that < m for all A:, one gets 



r— 1 



k=l 
r— 1 






km 



nPk 



k=r 



km 



< 



(2 



ek,m[ ^ ^ X] 



k=i 



k=r 



< (r-l)(2 + 2m) + 



UQt 



Under (10), the same choice of r(n) as in the proof of theorem 2.3 ensures that 
this difference is small compared to y/n: 



r{n) = [y/cn^+<^ \ + 1 . 

We also need to check that V'ar[5'i^:^] - na‘^ = o{n). One has: 

r— 1 r—1 

Var[Si^n] = 'Y^HGk]vk,m + Var[Gh]el„^+ ^ Cov[Gk,Gh]ek,meh,m 
k=l k^h=l 

Using the definition (15) of G^, one easily gets: 

' S + + 0(1) . 



(16) 



Prom this one deduces: 



1 — 1 






/r—1 



— n 



k=l 



^k=l 



* ^k,' 



km 



+ 0{r^) 



= nal, + o{n), 
still using expression (16) for r(n). 

Let us now turn to the upper bound. The number of remaining bins Rn,m 
certainly increases if one neglects to rearrange non homogeneous groups. It also 
increases if all items of size < 1/r are replaced by items of size 1/r in the original 
list and none of them disappears. Let Mr denote the number of items of type > r. 
The upper bound 52, n is the following: 

5'2,n = 5^1, n + {Mr/r + vm) . 

One has for all x: 

P[5n,m < ^] > P[52,n < Oc] . 

As before, one can check that E[52,n] = n/i^ + o(y^) and Uar[52,n] = + 

To finish the proof along the same lines as that of theorem 2.3, we need to 
check that the law of large numbers and the central limit theorem hold for 5i,n 
and 52, n, which are sums of random numbers of r.v.’s. We shall do it for Si^n] 




158 



Monia Bellalouna, Salma Souissi, and Bernard Ycart 



similar arguments hold for 52 , n- The law of large numbers is the easy part. By 
formula (15), Gk increases a.s. to infinity and 



lim — = ~ a.s. 
km 



Gk 



n—^oo Ti 

Hence: 

lim 5:41 = 

/=i ^ ’ 

Using again the expression (16) for r(n), it follows that 

lim ^ = firn Sl.S. 

n— >-oo n 



. Gk 

lim iy:4: - 

n^oo n 



Pk^k,rt 

km 



a.s. 



The central limit theorem is not as straightforward. Here are the main steps. 

We first check that the vector {Gk)i<k<r-i is asymptotically normal. Con- 
sider the vector (ATi, . . . , AT^_i, M^). Its distribution is multinomial, with parame- 
ters n and (p*, . . . ^*). Prom there, and formula (15), it follows that the vec- 

tor {Gk)i<k<r-i is asymptotically normal {Gk essentially behaves as Nk/{km)). 
More precisely, define for all fc > 1 



- Pk A u npk 

Pk = -j — and Hk = 7 = — 

km yjn 



The distribution of the random vector {Hk)i<k<r-i converges to the multidimen- 
sional Gaussian distribution with null expectation and covariance matrix C = 
{ck,h)i<k,h<r-i’> given by: 



Ck,k = Pfc(l - Pk) and Ck^h = -PkPh , iox k ^ h . 
For fc > 1, consider the partial sum 



Gk 



^k.m 



EK 



(0 



1=1 



Using the classical technique of characteristic functions, one can show that the 
distribution of the vector 

( 7=^(5fc,7n '^Pk^k,m)] 

Vv^ / l<fc<r-l 

converges to the multidimensional Gaussian distribution with null expectation and 
covariance matrix 

Di + D 2 GD 2 , 

where D\ and D 2 are the following diagonal matrices. 

Di = Diag{{pkVk,m)i<k<r-i) and D 2 = Diag{{ek,m)i<k<r-i) ■ 

Summing coordinates, it follows that S\^n is asymptotically normal, for any fixed 
r. There remains to let r = r{n) tend to infinity, using (16). The already given 
estimates on E[5i,n] and Uar[5i,n], yield that 






na^ 



= ( 5 l,n TlfJjm) 



converges in distribution to the standard Gaussian distribution. 



□ 




Probabilistic Bin Packing Problem 



159 



References 

[1] B.S. Baker and E.G. Coffman Jr. A tight asymptotic bound for next fit decreasing 
bin packing. SIAM Journal on Algebraic and Discrete Methods, 2:147-152, 1981. 

[2] M. Bellalouna. Problemes d’ optimisation combinatoire probabilistes. PhD thesis, 
Ecole Nationale des Ponts et Chaussees, Paris, 1993. 

[3] D. Bertsimas. Probabilistic Combinatorial Optimisation Problems. PhD thesis, Mas- 
sachusetts Institute of Technology, Cambridge, Mass., 1988. 

[4] D. Bertsimas, P. Jaillet, and A. Odoni. A priori optimization. Operations Research, 
38:1019-1033, 1990. 

[5] E.G. Coffman, Jr., M.R. Carey, and D.S. Johnson. Approximation algorithms for bin 
packing - a survey. In D. Hochbaum, editor. Approximation algorithms for NP-Hard 
problems, pages 46-93. PWS Publishing, Boston, 1996. 

[6] J. Csirik, J.B.G. Prenk, A.M. Frieze, G. Galambos, and A.H.G. Rinnooy Kan. A 
probabilistic analysis of the Next- Fit decreasing bin packing heuristic. Oper. Res. 
Lett., 5(5):233-236, 1986. 

[7] J. Csirik and E. Mate. The probabilistic behaviour of the NFD bin packing algorithm. 
Acta Cyber., 7:241-245, 1986. 

[8] W. Feller. An introduction to probability theory and its applications, volume I. Wiley, 
London, 3rd edition, 1968. 

[9] W. Feller. An introduction to probability theory and its applications, volume II. Wiley, 
London, 2nd edition, 1971. 

[10] M.R. Carey and D.S. Johnson. Computers and Intractability: A Guide to the Theory 
of 3^ P -Completeness. Freeman, San Francisco, 1979. 

[11] M. Hofri. Probabilistic analysis of algorithms. Springer-Verlag, New York, 1987. 

[12] M. Hofri. Analysis of algorithms - Mathematical methods, computational tools. Ox- 
ford University Press, Oxford, 1995. 

[13] M. Hofri and S. Kamhi. A stochastic analysis of the NFD bin packing algorithm. 
Journal of Algorithms, 7:489-509, 1986. 

[14] P. Jaillet. A priori solution of a traveling salesman problem in which a random subset 
of the customers are visited. Operations Research, 36:929-936, 1988. 

[15] P. Jaillet. Analysis of probabilistic combinatorial optimization problems in Euclidean 
spaces. Mathematics of Operation Research, 18:51-71, 1993. 

[16] V.V. Petrov. Limit theorems of probability theory. Oxford University Press, Oxford, 
1995. 

[17] W.T. Rhee. Probabilistic analysis of the next fit decreasing algorithm for bin-packing. 
Oper. Res. Lett, 6(4): 189-191, 1987. 

[18] W.T. Rhee. Correction to: Probabilistic analysis of the next fit decreasing algorithm 
for bin-packing. Oper. Res. Lett., 7:211, 1988. 

Monia BeUalouna, Salma Souissi 

CRISTAL pole GRIFT, Ecole Nationale des Sciences de ITnformatique, Tunisia 

{monia.bellalouna,salma.souissi}@ensi.rnu.tn 

Bernard Ycart 

MAP5 CNRS UMR 8145, Universite Paris 5, Prance 

ycart@math-info.univ-paris5.fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Distribution of WHT Recurrences^ 

Pawel Hitczenko, Jeremy R. Johnson, and Hung-Jen Huang 



ABSTRACT: This work explores the performance of a family of algorithms 
for computing the Walsh-Hadamard transform (WHT), a useful computation in 
signal and image processing [1, 2, 5]. The algorithms exhibit a wide range of per- 
formance and it is non-trivial to determine which algorithm is optimal on a given 
computer [4] . This paper provides a theoretical basis for the performance analysis. 
Performance is modelled by a family of recurrence relations that determine the 
number of instructions required to execute a given algorithm, and are related to 
standard divide and conquer recurrences [6, 3]. However, since there are a variable 
number of recursive parts which can grow to infinity as the input size increases, 
new techniques are required for their analysis. In the full version of this paper, 
the minimum, maximum, expected values, and variances are calculated and the 
limiting distribution is obtained. 

The Walsh-Hadamard Transform. The WHT of a signal x, of size AT = 2^, is 
the matrix- vector product WHTjv • x, where 

n 

^ ^ ^ 

WHTjv = 0 DFTs = DFT2 ® ® DFT 2 . 

i=l 

The matrix 

DFT2 = } 

is the 2-point DFT matrix, and 0 denotes the tensor or Kronecker product. Let 
n = rii -\ \-rit and Im denote the m x m identity matrix, then 

t 

WHT2n = JJ(l2 ni+ - +ni_i 0WHT2rii 0 (1) 

2=1 

This equation provides a family of divide and conquer algorithms for computing 
the WHT (implemented with a triply nested loop in [4]) and provides a mechanism 
for exploring different breakdown strategies. 

Performance Model for the WHT. Let W 2 r^ be a WHT algorithm, and let 
A(n) be the number of times the recursive WHT procedure is called, Ai(n) the 
number of times the straight-line code for WHT 21 is called. Finally, let Li{n), 
i = 1, 2, 3, be the number of times the outermost, middle, and innermost loops are 
executed. 

Then the number of instructions required to execute W 2 ^ is equal to 

3 

aA{n) + aiAi{n) + ^ l3iLi{n), 

l i=l 



1 Supported in part by NSA #MSPF-02G-043 and NSF ITR/NGS #0325687. 




162 



P. Hitczenko, J.R. Johnson, and H-J. Huang 



where a is the number of instructions for the code in the compiled WHT proce- 
dure executed outside the loops, is the number of instructions in the compiled 
straight-line code implementations of small WHT’s (the largest size considered in 
practice was 8), and /?^, z = 1,2,3 is the number of instructions executed in the 
outer-most, middle, and inner-most loops in the compiled WHT procedure. 

The functions A(n), Ai(n), Li(n) satisfy recurrences that can be derived 
from (1). Suppose n = ni + • • • + is the composition of n corresponding to 
the factorization in (1), then the functions A{n)^ Ai{n)^ Li{n), satisfy recurrence 
relations of the form F{n) = F{rii) -h /(z)}, where /(z) depends on 

the particular recurrence and is equal to 1/t, 0, 1, hn^-i respectively. 

While it is not possible to obtain closed forms for all of these recurrences, it 
is possible to determine the algorithm with the minimum and maximum number 
of instructions. It is also possible to determine the expected value and variance 
and the limiting distribution. In particular, it is possible to show that in the limit 
the distribution approaches a normal distribution. 

Since instruction count by itself does not accurately predict performance on 
modern processors, where pipeline performance, instruction level parallelism, and 
the memory hierarchy significantly affect performance, these results can not be 
used by themselves to model performance and select algorithms. Nonetheless, they 
give insight into the wide range of performance and the structure of the search 
space, and the techniques developed to analyze the recurrences provide tools for 
analyzing divide and conquer recurrences with varying numbers of recursive calls. 



References 

[1] K.G. Beauchamp. Applications of Walsh and related functions. Academic Press, 
1984. 

[2] D. F. Elliott and K. R. Rao. Fast Transforms: Algorithms, Analyses, Applications. 
Academic Press, 1982. 

[3] H-K. Hwang and R. Neininger. Phase change of limit laws in the quicksort recurrence 
under varying toll functions. SIAM J. Comput., 31:1687-1722, 2002. 

[4] J. Johnson and M. Piischel. In Search for the Optimal Walsh-Hadamard Transform. 
In Proceedings ICASSP, volume IV, pages 3347-3350, 2000. 

[5] F.J. MacWilliams and N.J. Sloane. The theory of error- correcting codes. North- 
Holland Publ.Comp., 1992. 

[6] H. Mahmoud. Sorting. A Distribution Theory. Wiley, 2000. 

Pawel Hitczenko 

Department of Mathematics, Drexel University, Philadelphia, PA19104 
phit czenko@cs . drexel . edu 

Jeremy R. Johnson 

Department of Computer Science, Drexel University, Philadelphia, PA19104 
j j ohnson@cs . drexel . edu 

Hung- Jen Huang 

Department of Computer Science, Drexel University, Philadelphia, PA 19 104 
hj huang@cs . drexel . edu 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Probabilistic Analysis for Randomized Game 
Tree Evaluation 

Tamur Ali Khan and Ralph Neininger 

ABSTRACT: We give a probabilistic analysis for the randomized game tree 
evaluation algorithm of Snir. We first show that there exists an input such that the 
running time, measured as the number of external nodes read by the algorithm, on 
that input is maximal in stochastic order among all possible inputs. For this worst 
case input we identify the exact expectation of the number of external nodes read 
by the algorithm, give the asymptotic order of the variance including the leading 
constant, provide a limit law for an appropriate normalization as well as a tail 
bound estimating large deviations. Our tail bound improves upon the exponent 
of an earlier bound due to Karp and Zhang, where sub-Gaussian tails were shown 
based on an approach using multi-type branching processes and Azuma^s inequal- 
ity. Our approach rests on a direct, inductive estimate of the moment generating 
function. 



1. Introduction 

In this note we analyze the performance of the randomized algorithm to evaluate 
Boolean decision trees proposed by Snir (1985). Given is a complete binary tree of 
height 2k, A: > 1, where the root (at depth 0) is labeled A as are all internal nodes 
with even depth, all internal nodes with odd depth are labeled V. The n = 2^^ 
external nodes are labeled either 0 or 1 and the objective is to calculate the value 
of the root. For each node its value is given as the value of the operation labeled at 
that node applied to the values of its children. The cost for evaluating the Boolean 
decision tree is measured as the number of external nodes read by the algorithm. 

Snir proposed and analyzed the following randomized algorithm to evaluate 
a Boolean decision tree: At each node one chooses randomly (with probability 
1/2) one of its children and calculates its value recursively. If the result allows 
to identify the value of the node (that is a 0 for a A-labeled node and a 1 for a 
V-labeled node, respectively) one is done, otherwise also the other child’s value 
has to be calculated recursively in order to obtain the value of the node. Applying 
this to the root of the tree yields the value of the Boolean decision tree. 

The advantage of this algorithm over any deterministic algorithm is that 
for any input at the external nodes its expected cost is sublinear in n, whereas 
any deterministic algorithm has linear worst case cost. More precisely, Saks and 
Wigderson (1986) obtained that the maximum expected cost is of the order 0(n^) 
with a = log 2 ((l H- \/^)/4) = 0.753 and showed that this is also a lower bound 
on the maximum expected cost for any other randomized algorithm to evaluate a 
Boolean decision tree; see also Motwani and Raghavan (1995, Chapter 2) for an 
account on this subject. Further analysis was given by Karp and Zhang (1995). 
For certain regular inputs at the external nodes the cost of the algorithm can be 




164 



Tamur Ali Khan and Ralph Neininger 



represented via 2-type Galton- Watson processes. Karp and Zhang showed that 
the normalized cost has sub-Gaussian tails. That argument was based on Azuma’s 
inequality. 

We denote the input of O’s and Ts at the external nodes as a vector v E {0, 
and the number of external nodes read by the algorithm on input v by C{v). We 
will see subsequently that for particular G {0, 1}’^ not only the expectation of 
the cost of the algorithm is maximized, i.e., EC{v'^) = max^^^o,!}^ EC(t>), but also 
that C{v'^) is maximal in stochastic order, C{v) :< C{v'^) for all v G {0, 1}’^. Here, 
A y for random variables X,Y denotes that the corresponding distribution 
functions Fx.Fy satisfy Fx{x) > Fy{x) for all x G M, or, equivalently, that 
there are realizations X\Y' of the distributions lL(A),£(y') of X,Y on a joint 
probability space such that we pointwise have X' <Y' . 

From this perspective it is reasonable to consider C[v'^) as the worst case 
complexity of the randomized algorithm and to analyze its asymptotic probabilistic 
behavior. Our results for the exact mean of the asymptotic growth of its 

variance including the evaluation of the leading constant, a limit law for C{v^) 
after normalization as fc — > oo together with an explicit tail estimate are based on 
a recursive description of the problem. Since is a regular input in the sense of 
Karp and Zhang, also their 2-type Galton- Watson approach applies. 

Our main finding is an improvement of the tail bound exp(— const t^) for 
t > 0, to exp(— const with — a) = 4.06, see Theorem 3.6. 

This is based on a direct, inductive estimate of the moment generating function. 
Our approach is also applicable to any regular input as well as to other related 
problems. 

The paper is organized as follows: In section 2 we explain, how a worst case 
input is obtained. Section 3 contains the statements of the results. In sections 4 
and 5 the 2- type branching process of Karp and Zhang (1995) is recalled and the 
recursive description of the quantities, that our analysis is based on, is introduced. 
Section 6 contains the proofs of our results and section 7 has extensions to m-ary 
Boolean decision trees. 



2. Worst case input 

In this section we explain how a worst case input is constructed. We first have 
a look at the case A: = 1 and v G {0, 1}^ such that the decision tree is evaluated 
to 1 at the root. Clearly both children of the root have to lead to an evaluation 
of 1. Now each pair of external nodes attached to the children needs to have at 
least one value 1. Note that the algorithm reads in both pairs of external nodes 
until it finds the first one. Hence there will in total be read two Ts no matter how 

V G {0,1}^ is drawn among the choices that lead to an evaluation of 1 for the 
decision tree. Clearly, to maximize the number of O’s being read we choose in each 
pair of external nodes one 0 and one 1. Then both O’s are being read independently 
with probability 1/2. Hence, v\ = (0, 1,0, 1) stochastically maximizes C{v) for all 

V G {0, 1}^ such that the decision tree evaluates 1, see Figure 1. 

Analogously look at the case fc = 1 and v G {0, 1}"^ such that the decision 
tree is evaluated to 0. Clearly, one child of the root has to have the value 0, whose 
external nodes attached need to have both values 0. If we choose also value 0 for 
the other child of the root, we are lead to = (0, 0, 0, 0), and the algorithm reads 
exactly 2 external nodes with values both 0. Therefore, to stochastically maximize 
C{v) we choose the second child of the root with value 1 and again its external 




Randomized game tree evaluation 



165 



nodes attached with values 0 and 1. Then, t'o = (0, 0, 0, 1) stochastically maximizes 
C{v) for all V G {0, 1}^ for which the decision tree evaluates to 0, see Figure 1. 

Since we have C{vq) :< it follows that = (0, 1,0, 1) is a choice with 

C{v) :< C{v'*') for all v e {0, 1}"^. For general /c > 2 a corresponding v'*' = v'*"{k) 
can recursively be constructed from v'^{k — 1) as follows: Each component 0 in 
v'^{k — 1) is replaced by the block 0, 0, 0, 1, whereas each 1 is replaced by the block 
0, 1, 0, 1. For example, for /c = 3, this yields 

= ( 0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 

0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 

0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 
0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 1 ). 

In Proposition 3.1 we show that this construction yields a v'^ with C{v) ^ 
for all V G {0, 1}’^ and fc > 1. 




Figure 1: Shown are decision trees for k = 1 evaluating at the root to 1 and 0, 
respectively, together with a choice for the external nodes that stochastically max- 
imizes the number of external nodes read by the algorithm. 



If we would only want to stochastically maximize the cost over all v G 
Ro{n) C {0, l}’^ that evaluate to a 0 at the root, the same recursive construc- 
tion of replacing digits by corresponding blocks, starting with vq = (0,0,0, 1), 
yields a G Ro{n) such that C{v) < C{v^) for all v G Ro(^)- 



3. Results 

We assume that we have n — 2^^ with fc > 1 and denote by G {0, l}’^ an input 
as constructed in section 2. 



Proposition 3.1. For v^ G {0, 1)’^ as defined in section 2 we have Civ) -< C{v^) 
for all V G {0, 1}^. 

The stochastic worst case behavior C{v'^) of the randomized game tree evalu- 
ation algorithm has the following asymptotic properties: The subsequent theorems 
describe the behavior of mean, variance, limit distribution, and large deviations of 
For the mean we have: 



Theorem 3.2. The expectation of C{v^) is given by EC{v'^) = cin^ - C 2 U^ , with 



, l + v/^ ^ , 1-V^ 

= r , /3 = log2 r , 



2 2^/M' 



C2 = Cl - 1. 



4 



4 















166 



Tamur Ali Khan and Ralph Neininger 



We denote for sequences (ak),{bk) by ak ~ bk asymptotic equivalence, i.e., 
dk/bk 1 as A: — > 00 . Then we have for the variance of C{v'^): 

Theorem 3.3. The variance of C{v^) satisfies asymptotically Var C{v^) ~ 
as k ^ oo, where d = 0.0938. The constant d can also be given in closed form. 



For random variables X, Y we denote by X = F equality in distribution, i.e., 
£(X) = £^(F). Then we have the following limit law for 



Theorem 3.4. For C{v*) we have after normalization convergence in distribution, 

C{v*) 






C, k 



oo, 



where the distribution of C is given as J^{C) = -C(Gi) and J^{G) = -C(Go,Gi) is 
characterized by E ||G|p < oo, EG = (co,ci), with cq = 1/2 + 5/(2\/^); and 



G(4) 



with . . . , G^^\ Bi,B 2 independent with = -C(G), forr = 1, . . . , 4, and 

£(Si) = JH(B 2 ) = ^(1/2). Here, B{l/2) denotes the Bernoulli{l/2) distribution. 



For the estimate of large deviations we rely on Chernoff ’s bounding technique. 
We need to follow a bivariate setting for the vector {C{v'^),G{v^)) as introduced 
in section 5. The following bound on the moment generating function is obtained: 

Proposition 3.5. It exists a sequence (Yfc)fc>o = (Fe,o? F/c,i)fc>o of bivariate ran- 
dom variables with marginal distributions L{{G{v'^) — EC{v'^))/n^), £((G(i;^) — 
EG(t'*))/n^) such that for all q > 1/a = 1.33 there is a K > 0 with 

E exp{s,Yk) < exp(X||s||^) (1) 

for all s eM? and k >0. An explicit value for K = Kq is given in (4)- 

The bound on the moment generating function in the previous proposition 
implies a large deviation estimate via Chernoff bounds: 



Theorem 3.6. For all 1 < hi < 1/(1 — a) = 4.06 there exists an X > 0 such that 
for any t > 0 and n = 2?^ 



G(^^)-EG(n*) 






> t ) < exp(— Xt^). 



( 2 ) 



An explicit value for X is given in (5). The same hound applies to the left tail. 



The approach of Karp and Zhang (1995) based on Azuma’s inequality gives 
the tail bound exp(— XT^) for an explicitly known X'. For n = 2 the prefactor 
X = X 2 in Theorem 3.6 can also be evaluated and satisfies X 2 > IIX'. 



4. Karp and Zhang’s 2-type branching process 

For the analysis of C{v'^) note that whenever the algorithm has to evaluate the 
value of a node at a certain depth that yields a 1, according to the discussion of 
section 2, the algorithm has to evaluate the values of two nodes of depths two levels 
below that each yield a 1, and B^ -h B 4 nodes of depths two levels below that each 
yield a 0, cf. Figure 1. Here, B^,B^ are independent Bernoulli B{l/2) distributed 
random variables. Analogously, when the algorithm has to evaluate the value of 




Randomized game tree evaluation 



167 



a node at a certain depth that yields 0, two levels below it has to evaluate Bi 
nodes yielding a 1 and 2 + ^1^2 nodes yielding a 0, where Bi, ^2 are independent 
B{l/2) distributed random variables. Here, the event {Bi = 1} corresponds to the 
algorithm first checking the right child of the node to be evaluated and {B 2 = 1} 
to first checking the left child of that child, cf. Figure 1. Since at each node the 
child being evaluated first is independently drawn from all other choices, this gives 
rise to the following 2- type Galton-Watson branching process. 

We have individuals of type 0 and 1 where the population of the fc-th gen- 
eration corresponds to the number of nodes at depth 2k that are read by the 
algorithm. We consider processes starting either with an individual of type 1 or 
type 0 and assume that the algorithm is applied to the worst case inputs and 
respectively. Then we have the following offspring distributions: An individual 
of type 1 has an offspring of 2 individuals of type 1 and S3 + S4 individuals of 
type 0. An individual of type 0 has an offspring of Si individuals of type 1 and 
2 + S1S2 individuals of type 0. We denote the number of individuals of type 0 
and 1 in generation k by {Vn\Wn^), when starting with an individual of type 
^ = 0, 1, where n = 2^^. Note that for G {0, 1}’^ we have the representations 

C{v*) ^ Fy ) + ^ + TF^ . 

This is the approach of Karp and Zhang (1995) for regular inputs like Hence, 

part of the analysis of C{v'^) can be reduced to the application of the theory of 
multi-type branching processes; see for general reference Harris (1963) and Athreya 
and Ney (1972), and for a survey on the application of branching processes to tree 
structures and tree algorithms see Devroye (1998). 

However, we will also use a recursive description of the problem. This will be 
given in the next section and enables to use as well results from the probabilistic 
analysis of recursive algorithms by the contraction method. 



5. The recursive point of view 

It is convenient to work as well with a recursive description of the distributions 
£j{C{Vi,)) and £j{C{v'^)). For this, we define the distributions of a bivariate random 
sequence (Z^) = (Zn,o, ^n,i) for all n — 2^^, fc > 0 by Zi = (1, 1) and, for /c > 1, 



7 A . 7(1) 7(2) 

- ^n/4 + + 



B1B2 

I-B 2 



7(3) 

^n/4 



0 Bi 

Bi 0 



^(4) 

nfA’’ 



where . . . , Hi,B2 are independent, Bi,B 2 are Bernoulli B{l/2) dis- 
tributed and H(Z^y^) = • • • = £(Z^^^^) = It can directly be checked by 

induction that the marginals of Z^ satisfy L{Zn,o) = ^(^('^★)) and L(Zn^i) = 
-C(C(u*)). Note that Zn,o and Zn,i become dependent, firstly, since we have cou- 
pled the offspring distributions using for the second component of Z^ again Bi 
and 1 - H2 instead of B^ and B4, cf. section 4, and, secondly, since the first 

component of contributes to both components of Z^. Sequences satisfying 

recursive equations as (Z^) are being dealt with in a probabilistic framework, the 
so called contraction method; see Rosier (1991, 1992), Rachev and Riischendorf 
(1995), Rosier and Riischendorf (2001), and Neininger and Riischendorf (2004). 




168 



Tamur Ali Khan and Ralph Neininger 



6. Proofs 

In this section we sketch the proofs of the results stated in section 3. 



Proof of Proposition 3.1: (Sketch) We denote by Ro{n), Ri{n) C {0, the sets 
of vectors at the external nodes at depth 2k that yield an evaluation at the root 
of the decision tree of value 0 and 1, respectively. From the discussion in section 2 
we have 

C{v) :< V e Ro{n)^ and C{v) ^ v € 

Hence, it remains to show that C{v^) :< This is shown by induction on 

A; > 1. For A: = 1 this can directly be checked. For the step A: — 1 ^ A: assume that 
we have C(^;^(A; — 1)) C{v'^{k — 1)). It suffices to find realizations of the quantities 

(Vn^\Wn^) and (Vn^\Wn^) on a joint probability space with < 

-h almost surely, n = 2^^. 

For this we use (Ki/ 4 ^ for i = 1, 2, j = 1, . . . , 4 being indepen- 
dent for each z = 0, 1 and with B,B' Bernoulli 5(1/2) distributed, = 

= '^(^n/ 4 ) for * = and j By the induction 

hypothesis we may assume that we have versions of these random variates with 

^ “ 1, ... ,4. With this coupling we define 

{Vn^\ Wn^^) and {Vn^\Wn^^) according to the values of 5, 5': On {B = 1,5' = 0} 
we set 




and obtain . On the remaining sets {5 = 0, 5' = 

0}, {5 = 0, 5' = 1}, and {5 = 1,5' = 1} similar couplings of (Vn^\Wn^), 
{V^^\ can be defined with ■ 



Proof of Theorem 3.2: (Sketch) Assume that a generation has (wq^wi) individuals 
of type 0 and 1. Then, by the definition on the offspring distribution in section 4, 
the expected number of individuals in the subsequent generation is given by 



M 




M := 



9/4 1 

1/2 2 



Since C{v'*') = C{v'^{k)) is the sum of the individuals at generation k for the 
process started with an individual of type 1 we obtain 

EC(u*) = (1,1)M'= ( 5 ). 




Randomized game tree evaluation 



169 



The matrix M has the eigenvalues Ai = (17 + \/^)/8 and A 2 = (17— a /^)/8 and 
its fc-th power can be evaluated to 

fe ^r(v^ + l)Aj + (v^-l)A§ 8(AJ-A§) 

2v^L 4(A^-A§) (V33-1)A^ + (V^+1 )A| _ • 

Prom this, EC(t;'^) and various constants needed subsequently can be read off. 
Note, that A^ = with a given in Theorem 3.2 and n = 2^^. ■ 



Before proving Theorem 3.3 it is convenient to first prove Theorem 3.4. 



Proof of Theorem 3.4: (Sketch) The 2 - type branching process defined in section 
4 is supercritical, nonsingular, and positive regular. Hence, a theorem of Harris 
(1963) implies that 






Y 




almost surely, as fc ^ cx), where T is a nonnegative random variable and (i^i, 1 ^ 2 ) 
a deterministic vector that could also be further specified. Thus we obtain 

in distribution, as fc 00 , with £j{C) = -C((i^i + ^ 2 )^)- 

On the other hand the recursive formulation of section 5 leads after the 
normalization Xn := Znjrf to 



Y — \ ^ A Y^^^ 



for /c > 1 , where Ai — A 2 = (l/ 4 “)/ 2 , with the 2 x 2 identity matrix I 2 , and 



^3 = 



1 



B1B2 0 
I-R 2 0 ’ 




0 Bi 

Bi 0 ’ 



( 3 ) 



where ^^^ 4 , • • • , Bi,B 2 are independent with = £-(X„/ 4 ) for r = 

I,...-, 4 and H{Bi) = £(. 62 ) = -6(1/2). It follows from the contraction method 
that Xn converges weakly and with all mixed second moments to some G, that 
can be characterized as in Theorem 3.4. For details, how to apply the contraction 
method, see Theorem 4.1 in Neininger (2001). Thus, we have C{v'^)/n^ Gi in 
distribution. ■ 



Proof of Theorem 3.3: (Sketch) As shown in the proof of Theorem 3.4 we have 
the convergence Xn = Znlrf G for all mixed second moments. This, in 
particular, implies VarX^^^ ^ VarGi. The variances of G\ can be obtained 
from the distributional identity for G stated in Theorem 3.4. Then we obtain 
VarC(v^) = Var(n^X^ 7 ) - with d = Var Gi. ■ 




170 



Tamur Ali Khan and Ralph Neininger 



Proof of Proposition 3,5: For Ya = {l/n^)(Zn~E Zn) we have marginals -C(Kn,i) — 
£j{{C(v'^) — EC{v'^))/n^) and £(17i,o) = ^((C(v^) — EC(v^))/n^). The distribu- 
tional recurrence for Zn from section 5 implies the relation 

4 

Y„^J^ArY^^l + bn, k>l, 

r=l 

independent, ^(Y^/l) = ^(Y„/ 4 ), for r = 

C(Bi) = L(B 2 ) = 5(1/2) and b„ = (l/n“)(4“ - EZ„). The 

matrices are given in (3). 

We prove the assertion by induction on k. For k = 0 we have 1 q = 0, thus 
the assertion is true. Assume the assertion is true for some n/4 = Then, 

conditioning on (Ai, . . . , A4, 6^), denoting the distribution of this vector by 
and using the induction hypothesis, we obtain 

E exp(s, / exp^S, Pn) J[ ^ exp^s, dj-Yn ^ 4) d(Tn{(^l 5 • • • ? <2.4, Pn) 

r=l 

f ^ 

< I exp(s,,5„) JJexp(5'||o^s||'^)d(T„(oi,...,a4,/3„) 

r=l 

exp T-fflls rt ll®r||op^ • • • ) ^4) Pn) 

= E exp((s, bn) + KWsim exp(/C||sr), 

with U := Eti (IIAIIgp) - 1 = 4-“?(2 + B 4 B 2 + (1 - B 2 ) + 5i) - 1 and P||op = 
supj|2,||^^ II ^^11 for niatrices A. Hence, the proof is completed by showing 

supE exp((s, bn) + A||5||^/7) < 1, 
k>l 




for some appropriate A > 0. We denote ^ — ess sup U = 1 — 4^"^^, thus q > 1/ a 

implies ^ > 0. 

Small ||s||: First we consider small ||5|| with \\s\\ < c/ ||^?n||2,oo for some 

c > 0, where ||6n||2,oo •= II Pn|| ||oo, the inner norm being the Euclidean norm. Note 
that throughout we have n — n{k) = 2^^. For these small ||5|| we have 

E exp(((s, bn) + A||s||^C/) < exp(-A||s||^^)E exp(s, bn) 
and, with E (s, bn) — 0, 



E exp(5, bn) 



= E 



l + (s,fr«) + X] f., 

= 1 +£{.,(,„)- f; 



b 

? ^n/ 



k=2 



k\ 



< l + ||sfE||6„f5] 



- ^fc-2 



= l + ||sfE||h, 



k\ 

k=2 

— I — c 




Randomized game tree evaluation 



171 



Using exp(-X||s||^^) < 1/(1 + X||sH‘^^) and with ^^(c) = (e^ - 1 - c)/(? we obtain 

l + ||.si|2Eli^,||2^(c) 



Eexp((s,6„) + K||sr[7) < 



l + /^||sK 



Hence, we have to choose K with 



I|2-9 



K > 



«-(c) 



SUpE \\bn 
k>l 



With ||s|| < c/ sup^>i ||^n||2,oo a possible choice is 

supfc>iE||b„|p ^^(c) 

SUPfc>i ||i>n|l2;;^ ^ 

with ^^q(c) = (e^ — 1 — c)/c^. 

Large ||s||: For general 5 G we have 

{s,bn) + K\\s\\m < ||s||||6,|| - ll^ll^i^e < l|-S‘||||^,||2.oo - 
and this is less than zero if 

,, ^ SUPfe>i ||b„||2,oo _ SUPfe>i ||6„||2^ 

” supfc>iE||6„||2^,(c)- 

If ||s|| satisfies the latter inequality we call it large. Thus, for large ||s|| we have 
supfe>i E exp((s, bn) + K\\spU) < 1. 

In order to overlap the regions for small and large ||s|| we need 

SUPt>l E ll&nIP 

The right hand side of the latter display can be evaluated explicitly for our problem 
and equals 104/77. Thus, this inequality is true for, e.g., c = 1.53. Hence, with the 
explicit value 

„ ^ ^ supfc>iE||6„||2 el-53 -2.53 

the proof is completed. ■ 



Proof of Theorem 3.6: By Chernoff ’s bounding technique we have, for > 0 and 
with Proposition 3.5, 



C{v'^)-EC{v^) 



>t 



P(exp(uYn,i) > exp(ut))) 



< E exp(ul7i^i — ut) 

= E exp(((0, u),Yn) - ut) 

< exp{KqU^ — ut)^ 

for all Kq as in Proposition 3.5 and (4). Minimizing over ti > 0 we obtain the 
bound 



P 



C(i;^)-EC(^*) 







< exp(— Lt^), 




172 



Tamur Ali Khan and Ralph Neininger 



for 1 < K < 1/(1 — a), with 



L = L^ = K^-'^ 



(k-1) 



K —1 



and («:-!) given in (4). This completes the tail bound. ■ 



( 5 ) 



7. m-ary Boolean decision trees 

The analysis can be carried over to the case of m-ary Boolean decision trees. The 
algorithm visits randomly chosen children and evaluates recursively their values 
until the value of the root can be identified, the remaining children are discarded 
afterwards. A worst case input v'*' E {0, with n = can be constructed 
similarly. Then we have similar results for C{v'^): 



Theorem 7.1. For the worst case complexity C(v^) of evaluating an m-ary Boolean 
decision tree we have the following asymptotics: 

E(7(n*) = 

Var (^(n*) ~ 

C{v*) 






Crm 



>t] < exp(— ^ > 0, 






with constants a^, f3m, dm, > 0, £ ]R, and 1 < k < Km = 

1/(1 -«m). 

Numerical values for am, dm and Km are listed in Table 1. The distribution 
of Cm is given as L{Cm) = ^(Gi) and T(G) = H(GoiGi) is characterized by 

E ||Gf < oo, EG = and 



G = 



m— 1 r 



m 



=1 

m— 1 

+ E 



r=l r=l 

m—1 r 



0 lr{Uo) 

MUo) 0 



r,i=l 



lr{Uo)le{Ur) 0 

1 - le{Ur) 0 



Qir) 

G(r,£)l 



with = £(G(^)) = £(G('’’^)) = £(G) and Ur independent 

with L{Ur) = unif{0, ... ,m — 1} for all r, £ Here, we denote li{Y) l{z<y} for 
integer i and a random variable T, and we have 



(m) _ 1 , 
^0 — ^ 



m + 3 



2 2y^l6m + (m — 1)' 






- 



- + 



3m 4- 1 



2^yl6m + (m — 1)^ 




Randomized game tree evaluation 



173 



m 


2 


3 


4 ' 


5 


6 


7 


8 




0.754 


0.759 


0.765 


0.769 


0.774 


0.778 


0.781 


dm 


0.0938 


0.0847 


0.0782 


0.0731 


0.0689 


0.0652 


0.0619 




4.060 


4.154 


4.247 


4.336 


4.419 


4.497 


4.571 



m 


9 


10 


11 


12 


13 


14 


15 




0.785 


0.788 


0.790 


0.793 


0.795 


0.798 


0.800 


dm 


0.0590 


0.0564 


0.0541 


0.0519 


0.0499 


0.0481 


0.0464 




4.641 


4.707 


4.769 


4.829 


4.886 


4.940 


4.993 



m 


16 


17 


20 


30 


40 


50 


100 




0.802 


0.804 


0.809 


0.821 


0.830 


0.837 


0.856 


dm 


0.0448 


0.0433 


0.0394 


0.0304 


0.0247 


0.0209 


0.0117 




5.043 


5.091 


5.226 


5.596 


5.885 


6.123 


6.928 



Table 1: Numerical values of the quantities am, dm ci'f^d i^m appearing in Theorem 
7.1 for various values of m. 



References 

[1] Athreya, K. B. and Ney, P. (1972) Branching processes. Die Grundlehren der math- 
ematischen Wissenschaften, Bd. 196, Springer- Verlag, New York- Heidelberg. 

[2] Devroye, L. (1998) Branching processes and their applications in the analysis of 
tree structures and tree algorithms. Probabilistic methods for algorithmic discrete 
mathematics, 249-314, Algorithms Combin., 16, Springer, Berlin. 

[3] Harris, T. E. (1963) The theory of branching processes. Die Grundlehren der Math- 
ematischen Wissenschaften, Bd. 119, Springer- Verlag, Berlin; Prentice- Hall, Inc., 
Englewood Cliffs, N.J. 

[4] Karp, R. and Zhang, Y. (1995) Bounded branching process and AND/OR tree eval- 
uation. Random Structures Algorithms 7, 97-116. 

[5] Motwani, R. and Raghavan, P. (1995) Randomized algorithms. Cambridge University 
Press, Cambridge. 

[6] Neininger, R. (2001) On a multivariate contraction method for random recursive 
structures with applications to Quicksort. Random Structures Algorithms 19, 498- 
524. 

[7] Neininger, R. and Riischendorf, L. (2004) A general limit theorem for recursive 
algorithms and combinatorial structures. Ann. Appl. Probab. 14, 378-418. 

[8] Rachev, S. T. and Riischendorf, L. (1995). Probability metrics and recursive algo- 
rithms. Adv. in Appl. Probab. 27, 770-799. 

[9] R5sler, U. (1991). A limit theorem for “Quicksort”. RAIRO Inform. Theor. Appl. 25, 
85-100. 

[10] Rosier, U. (1992). A fixed point theorem for distributions. Stochastic Process. 
Appl. 42, 195-214. 

[11] Rosier, U. and Riischendorf, L. (2001). The contraction method for recursive algo- 
rithms. Algorithmica 29, 3-33. 

[12] Saks, M. and Wigderson, A. (1986) Probabilistic boolean decision trees and the com- 
plexity of evaluating game trees. Proceedings of the 27th Annual IEEE Symposium 
on Foundations of Computer Science, 29-38, Toronto, Ontario. 

[13] Snir, M. (1985) Lower bounds on probabilistic linear decision trees. Theoret. Comput. 
Sci. 38, 69-82. 





174 



Tamur Ali Khan and Ralph Neininger 



Tamur Ali Khan and Ralph Neininger 

Department of Mathematics 
J.W. Goethe University 
Robert-Mayer-Str. 10 
60325 Frankfurt a.M. 

Germany 

{ alikhan , neiningr } @ismi. math. uni- frankfurt .de 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Polynomial Time Perfect Sampling Algorithm 
for Two-Rowed Contingency Tables 

Shuji Kijima and Tomomi Matsui 



ABSTRACT: This paper proposes a polynomial time perfect (exact) sam- 
pling algorithm for 2 x n contingency tables. Our algorithm is a Las Vegas type 
randomized algorithm and the expected running time is bounded by 0(n^lnN) 
where n is the number of columns and N is the total sum of whole entries in a table. 
The algorithm is based on monotone coupling from the past (monotone CFTP) 
algorithm and new Markov chain for sampling two-rowed contingency tables uni- 
formly. We employed the path coupling method and showed the mixing rate of our 
chain. Our result indicates that uniform generation of two-rowed contingency ta- 
bles is easier than the corresponding counting problem, since the counting problem 
is known to be #P-complete. 



1. Introduction 

In this paper, we propose a polynomial time perfect (exact) sampling algorithm 
for 2 X n contingency tables. Our algorithm is a Las Vegas type randomized al- 
gorithm and the expected running time is bounded by O(n^lnAT) where n is the 
number of columns and N is the total sum of whole entries in a table. Our re- 
sult indicates that uniform generation of two-rowed contingency tables is easier 
than the corresponding counting problem, since the counting problem is known to 
be :^P-complete [11]. The main idea of our algorithm is not the simple rejection 
sampling but the monotone coupling from the past (monotone CFTP) algorithm 
proposed by Propp and Wilson [18]. 

For sampling two-rowed contingency tables. Dyer and Greenhill [10] proposed 
a fully polynomial time approximately uniform sampler based on the Metropolis- 
Eastings algorithm with a natural Markov chain. They showed that the mixing 
time of their chain is bounded by (l/2)n(n— 1) ln{Ne~^) by using the path coupling 
technique proposed by Bubley and Dyer [5]. We propose a new Markov chain which 
is obtained by forbidding some moves of Dyer and Greenhill’s chain. Although we 
have shown the bound by using path coupling method also, we need to introduce 
a preprocedure which is not appeared in Dyer and Greenhill’s method. We also 
introduce a specified partial order on the set of tables and show the monotonicity of 
our chain. In the paper [18], Propp and Wilson showed that if we have a monotone 
chain with polynomial time mixing rate, there exists a polynomial time monotone 
CFTP algorithm. By applying their technique to our rapidly mixing monotone 
chain we can construct a polynomial time perfect sampling algorithm. 

A contingency table is a matrix of nonnegative integers with prescribed pos- 
itive row and column sums. Contingency tables are used in statistics to store data 
from sample survey. A test of independence between rows and columns is a statis- 
tic interest for contingency tables. Exact test proposed by Fischer [12] is one of the 
tests for this purpose. Diaconis and Effron also discussed a test for independence 




176 



Shuji Kijima and Tomomi Matsui 



in contingency tables [7]. Exact test can be done by systematic enumeration of 
all tables, but it is hard to enumerate all tables. In practice, Markov chain Monte 
Carlo (MCMC) method is used for calculation of p value in exact test (see [1, 2] 
for example). A MCMC method is a Monte Carlo method which uses samples 
from the stationary distribution of a Markov chain. The problem of using MCMC 
method, however, is “how many times do we have to simulate the transitions for 
the purpose of sampling from stationary distribution?” A practical solution for 
this problem is to use approximate sampler obtained by interrupting transitions 
in finite time. 

There are many works for almost uniform sampling contingency tables using 
a Markov chain. Diaconis and Saloff-Coste [8] discussed the rate of convergence 
of a simple Markov chain for 2-dimensional contingency tables. They have shown 
that the simple chain mixes polynomial time in the table sum when the numbers 
of rows and columns are fixed. Dyer, Kannan and Mount [11] proposed a different 
Markov chain for counting the number of 2-dimensional contingency tables. In 
case of sufficiently large marginal sums, their chain mixes polynomial time in the 
number of rows and columns. For two-rowed tables, Hernek [14] showed that the 
mixing time of the simple Markov chain is bounded by a polynomial of table 
sum and number of columns. Hernek bounded the mixing time of the chain by 
using coupling theorem shown by Aldous [3]. Dyer and Greenhill [10] proposed a 
rapidly mixing Markov chain for two-rowed contingency tables. Their chain mixes 
polynomial time in the number of columns and the logarithm of table sum. They 
analyzed the mixing rate of their chain by using path coupling technique proposed 
by Bubley and Dyer [4, 5]. In the paper [17], Matsui, Matsui and Ono extended 
Dyer and GreenhilPs result to 2 x • • • x 2 x J contingency tables. Recently, Cryan, 
Dyer, Goldberg, Jerrum and Martin [6] showed that 2x2 chain, which is an 
extension of Dyer and Greenhill’s, is rapidly mixing when the number of rows (or 
columns) is a constant, 

Propp and Wilson devised a surprising simple algorithnt', called CFTP al- 
gorithm (or backward coupling), which produces exact samples from the limit 
distribution [18, 19]. CFTP algorithm simulate infinite time transitions of a chain 
in a (probabilistically) finite time, for any finite Markov chain. In CFTP algo- 
rithm, however, we need to check the “coalescence condition,” by executing the 
simulations from all the states. Thus CFTP algorithm is not available straightfor- 
wardly. A monotone CFTP algorithm is an algorithm for monotone Markov chain, 
which has a partially ordered state space and a transition rule which preserves the 
partial order. If the given chain has the monotonicity, it relaxes the difficulty of 
simulation from all states. 

In the next section, we review the (monotone) CFTP algorithm and the 
theorem proposed by Propp and Wilson in [18]. We propose a new Markov chain 
for 2 X n contingency tables and a sampling algorithm based on monotone CFTP 
algorithm in Section 3. In Section 4, we show the monotonicity of our chain. In 
Section 5, we analyze the expected running time. 



2, Review of Coupling From The Past Algorithm 

When we simulate an ergodic Markov chain for infinite time, we can gain a sample 
exactly according to the stationary distribution. Suppose that there exists a chain 
from infinite past, then a possible state at the present time of the chain for which 
we can have an evidence of the uniqueness without respect to an initial state of the 




Sampling Algorithm for Contingency Tables 



177 



chain, is a realization of a random sample exactly from the stationary distribution. 
This is the key idea of CFTP. 

Suppose that we have an ergodic Markov chain MC with finite state space 
and transition matrix P, The transition rule of the Markov chain X X' can be 
described by a deterministic function (f) : x [0, 1) ^ called update function, 

as follows. Given a random number A uniformly distributed over [0,1), update 
function (j) satisfies that Pr(</)(x, A) = y) = P{x, y) for any x,y G Q,. We can realize 
the Markov chain by setting X' = A). Clearly, update function corresponding 

to the given transition matrix P is not unique. The result of transitions of the 
chain from the time to ^2 (^i < ^ 2 ) with a sequence of random numbers A = 
(A[ti], A[ti + 1], . . . , A[t 2 — 1]) G [0, l)* 2 -ii ig denoted by A) : x [0, l)* 2 -ii ^ 

Q where A) 0(0(- • • (0(x, A[ti]), . . . , A [^2 - 2]), A [^2 — 1]). We say that a 
sequence A G [0, 1)1^1 satisfies the coalescence condition, when 3y G Q, \/x G f2, 
2/ = A). 

With these preparation, standard Coupling From The Pa^st algorithm is ex- 
pressed as follows. 

Algorithm 1. (CFTP Algorithm [18]) 

Step 1. Set the starting time period T := -1 to go back, and set A be the empty 
sequence. 

Step 2. Generate random real numbers A[T],A[T + 1], . . . , A[[T/2] — 1] E [0,1), 
and insert them to the head of A in order, i.e., put A (A[T],A[T -j- 

Step 3. Start a chain from each element x G ai time period T, and run each 
chain to time period 0 according to the update function (j) with the se- 
quence of numbers in A. (Here we note that every chain uses the common 
sequence A.) 

Step 4. [Coalescence check] The state obtained at time period 0 can be denoted 
hy^^x,\). 

(a) If G fi, Vx G y — $^(x. A), then return y and stop. 

(b) Else, update the starting time period T := 2T, and go to Step 2. 

Theorem 2.1. (CFTP Theorem [18]) Let MC he an ergodic finite Markov chain 
with state space fi, defined by an update function (j) : Cl x [0,1) ^ If the CFTP 
algorithm (Algorithm 1) terminates with probability 1, then the obtained value is 
a realization of a random variable exactly distributed according to the stationary 
distribution. 

Theorem 2.1 gives a (probabilistically) finite time algorithm for infinite time 
simulation. However, simulations from all states executed in Step 3 is a hard 
requirement. 

Suppose that there exists a partial order on the set of states Cl. A transi- 
tion rule expressed by a deterministic update function (j) is called monotone (with 
respect to “^”) if VA G [0, 1), Vx, Vy G Cl, x ^ y ^ (j){x. A) ^ </)(y. A). For ease, we 
also say that a chain is monotone if the chain has a monotone transition rule. 

Theorem 2.2. (monotone CFTP [18, 9]) Suppose that a Markov chain defined 
by an update function (j) is monotone with respect to a partially ordered set of 
states {Cl, ^), and 3xmax,3xmin G Vx G Cl, x^ax ^ X ^ Xmin- Then the CFTP 
algorithm (Algorithm 1) terminates with probability 1, and a sequence A G [0, 1)1^1 
satisfies the coalescence condition, i.e., 3y G Cl,\/x G Cl, y = ^^{x,X), if and only 
'If ^X’(Xmax? A) = ^X’(Xmin5 A) . 




178 



Shuji Kijima and Tomomi Matsui 



When the given Markov chain satisfies the conditions of Theorem 2.2, we can 
modify Algorithm 1 by substituting Step 4 (a) by 

Step 4. (a)' If G 0, y = A) = A), then return y. 

The algorithm obtained by the above modification is called a monotone CFTP 
algorithm. 



3. Perfect Sampler for 2 x n Contingency Tables 



In this section, we introduce our algorithm. We denote the set of real numbers 
by R and the set of integers (non- negative, positive integers) by Z (Z+, Z++), 
respectively. Let r = (ri,r2) G Z^_^ and s = (5i,...,5n) G be a pair of 
vectors satisfying ~ ^ ^ ^++* The set S of 2 x n contingency 

tables with row and column sums (r,s) is defined by 



H I^X 



E”=i^[*,i] = n (i<vi<2), 1 

Ei=i X[i,j] = Sj (1 < Vj < n) J 



where X[i,j] is the value in the cell indexed by ith row and jth column. 

We propose a new Markov chain M with state space S for given r and s. For 
any column index j G {1, . . . , n — 1}, we define 



axU) = X[l,j] + X[l,j-{-l], (1) 

bx{j)=-X[2,j] + X[2,j + l], (2) 

Ox{j) ‘^= min{ax(i),6xO'),Sj,Sj+i} + 1. (3) 

The transition rule of M is defined by the following update function (j) : Ex[l,n) ^ 
S. For a current state X G S, the next state X' = 0(X, A) G 5 with respect to a 
random number A G [1, n) is defined by 

f min{ax(j),Sj} - [{X ~ [AJ) 6>x(i)J (j = [AJ), 

= \ O'xij) - min{axO),Sj} 4- [(A - [AJ) 9x{j)\ \j = [AJ + 1), 

[ X[l,j] (otherwise), 

X'[2,j] = s,-X%j]. 

Our chain M is a modification of Dyer and GreenhilPs chain ([10]) obtained by 
restricting to choose only a consecutive pair of columns. Clearly M is finite, aperi- 
odic and irreducible and so ergodic. The chain has a unique stationary distribution, 
which is the uniform distribution. 

We define two special tables Xu and Xl G S by 



Xu (x[i,j]eZ+ 

Xl (x[i,i]ez+ 



3k e {1, . . . ,n}, n = E)=1 X[i,j] < Y .%1 sj, 
X[2,j]=0(j = l,...,fc-1) 

3/ G {1, . . .,n}, n = < E”=i 

X[2,i] = 0(i = / + l,...,n) 



Here we note that Xu,Xl are obtained by the North-West corner rule and the 
North-East corner rule, respectively. Now we describe our sampling algorithm. 



Algorithm 2. 

Step 1. Set the starting time period T — 1 to go back, and set A be the empty 

sequence. 




Sampling Algorithm for Contingency Tables 



179 



Step 2. Generate random real numbers A[T],A[T + 1], . . . , A[[T/2] — 1] G [l,n), 
and put A := (A[T], A[T + 1], . . . , A[-l]). 

Step 3. Start two chains from Xu and Xl, respectively at time period T, and 
run them to time period 0 according to the update function 0 with the 
sequence of numbers in A. 

Step 4. [Coalescence check] 

(a) If 3Y G S, y = $5^(Xu, A) = $^(Xl, A), then return Y and stop. 

(b) Else, update the starting time period T := 2T, and go to Step 2. 

Theorem 3.1. With probability 1, Algorithm 2 terminates and returns a table. The 
table obtained by Algorithm 2 is a realization of a random sample according to the 
exactly uniform distribution on S. 

Theorem 3.1 guarantees that Algorithm 2 is a perfect sampling algorithm. 
We prove Theorem 3.1 by showing the monotonicity in the next section. 



4 , Monotonicity of the Chain 

In Section 2, we described two theorems. Thus to prove Theorem 3.1, we only need 
to show that Algorithm 2 is a monotone CFTP algorithm. For this purpose, in this 
section, we introduce a partial order on H, and show that Xu and Xl is a unique 
pair of maximum and minimum elements of S, and Markov chain M is monotone. 
For any X G S, we define the cumulative sum vector fx G by 

n / -\ Jo (i = 0), 

}x{i) - I x[l,l] + ... + X[l,z] (zE{l,...,n}), 

where fx (/x(0), /x(l), • • • , /x(^))- Obviously from the definition, there exists 
a bijection between E and {fx \ X G E}. For any pair X, T G S, we say X ^ T if 
and only if fx — fv ^ 0. It is clear that the relation is a partial order on E. 
We can see easily that Xu ^ X ^ Xl for any X G S. 

We say that a state X e E covers Y G S (at /c), denoted by X • >- T (or 
X y), when 

Note that X y if and only if 

r +1 {i = k), 

x[i,i]-r[i,i] = -1 {i = k + i), 

{ 0 (otherwise). 

Lemma 4.1. If a pair of distinct states X, y G S satisfies X ^Y, then 3Z G E, 
X-yZhY. 

Proof is omitted. It is not hard to show the above Lemma. 

The following is a key lemma for proving the monotonicity of our chain. 

Lemma 4.2. If a pair of states X, y G 5 satisfies X • y, then VA G [l,n), 
(/>(X,A)^(/)(y,A). 




180 



Shuji Kijima and Tomomi Matsui 



Proof: We denote 0(X, A) = X’ and (j>{Y, A) = Y' for simplicity. For any index 
i / [AJ, it is clear that fx'{i) = fx{i) and /y/(i) = /y(i), and so fv'ii) = 

fx{i) — /y (0 — 0 since X ^ y. In case that i = [AJ , 

fx'{[\\) - fY'{[x\) = (/xKLAJ - 1) + X'[1, LAJ]) - {fy>{[x\ - 1) + y'[i, LAJ]) 
= {/x(LAJ - 1) - /y(LAJ - 1)} y (X'[ 1 , LAJ] - y'[i, [aj]) 

- {/xaAJ-l)-/yaAj-l)} 

+ min{ax,5LAj} - L(A - [A]) 6x\ - min{ay, s^aj} + L(A - [AJ) 0y\^ 

^ r Ary + A0 (LAJ^fc + 1), 

^ 1 + Ary + A0 ( [AJ = fc + 1), 

where ax <^x(LAJ), cly ^y(LAJ)? ^x ^x([AJ), 6y 0y{[X\) (see (1) 
and (3) for detail), Ary min{ax, Sj^Aj } ~ ^^i^Aj} and Ad —[(A — 

LAJ)0xJ + L(A-LAJ)0yJ. 

1. Consider the case that [AJ = k - 1, Then ax = cty Y 1 and bx = by — 1, 

where bx ^x([AJ) and by 6y([AJ) (see (2) for detail). 

(a) If ay > s^Aj^ then Ary = 0 and Ox < dy. Thus Ad > 0, hence 
/x'([AJ) - /y'([AJ) > 0. 

(b) If ay < S|^Aj 5 then Ary — 1 and Ox < dy Y 1. Thus Ad > —1, hence 

/x'([AJ) - /y'([AJ) > 0. 

2. Consider the case that [AJ — fc + 1. Then ax = ay — 1 and 6x = ^>y + 1- 

(a) If ax > 5[aj ? then 1 + Ary > 1 and dx < Oy Y 1. Thus AO > —1 and 

/x'([AJ) - /y'(LAJ) > 0. 

(b) If ax < 5|^Aj7 then 1 + Ary > 0. Note that ax Y bx = ay by = 

5[aj +'5laj+i? < Oy. Thus Ad > 0, hence /x'([AJ) ~/y'(LAJ) > 0- 

3. Consider the remained case that [AJ ^ fc + 1 and [AJ ^ k — 1. Then 
ax = ay, Ary = 0, A0 = 0, and /x'([AJ) - /y'([AJ) = 0. 

Prom the above, we have fx> > fy and so 0(X, A) ^ (j){Y, A). □ 

Lemma 4.3. The Markov chain M is monotone, i.e., VA E [l,rr), VX, VT E S, 

X yy ^ (l){x,x)hHy^x). 

Proof: By applying Lemma 4.1 repeatedly, we can show that for any pair of states 
X, y E 5 satisfying X ^ y, there exists a sequence X = Zq, Zi, , Zr = Y with 
appropriate length such that Zi ^ E {0 < i < R) and Zo^y Zi-y • • • •>- Zr. Then 
Lemma 4.2 implies that 0 (Zq, A) ^ (p{Zi,X) ^ ^ (f>{ZR,X) for any A E [l,rr). 

Thus VA E [l,n), (j){X,X) y (j){Y,X). □ 

Lastly, we show the correctness of our algorithm. 

Proof of Theorem 3.1: From Lemma 4.3, the Markov chain M is monotone, and it 
is clear that Xy and Xl is a unique pair of the maximum and minimum elements. 
Then Algorithm 2 is a monotone CFTP algorithm, and so we can show Theorem 3.1 
by using Theorem 2.1 and Theorem 2.2. □ 



5. Expected Running Time 

Here, we discuss the running time of our algorithm. In this section, we assume to 
introduce a special preprocess and we get the following condition. 




Sampling Algorithm for Contingency Tables 



181 



Condition 1. Column sum vector s satisfies > S 2 > • • • > 

We can assume Condition 1 by sorting column sums in O(nlnn) time. The 
following is a main result of this paper. 

Theorem 5.1. Under Condition 1, the expected running time of Algorithm 2 is 
bounded by 0{n^lnN) where n is the number of columns and N is the total sum 
of whole entries in a table o/S. 



In the rest of this section, we prove Theorem 5.1 by estimating the expectation 

of coalescence time T* G Z++ defined by = min{t > 0 | G fl, Vx G fi, y = 
A)}. Note that is a random variable. 

Given a pair of probabilistic distributions Ui and U 2 on the finite state space 

def 

fl, the total variation distance between and 1^2 is defined by dxv(^i5^2) = 
^ ~ ^2(^)1* The mixing rate of an ergodic Markov chain is defined 

by r maXa,^Q{min{t | Vs > dTv(^?-Px) ^ V^}} where tt is the stationary 
distribution and Pf. is the probabilistic distribution of the chain at time period 
s > 0 with initial state x at time period 0. Path Coupling Theorem is a useful 
technique for bounding the mixing rate. 



Theorem 5.2. (Path Coupling [5]) Let MC be a finite ergodic Markov chain with 
state space fl. Let G = be a connected undirected graph with vertex set Q 

and edge set L C ( ^ 1 • Tei I : L ^ R be a positive length defined on the edge 



set For any pair of vertices {x,y} of G, the distance between x and y, denoted by 
d{x^y) and/or d(y,x), is the length of a shortest path between x and y, where the 
length of a path is the sum of the lengths of edges in the path. Suppose that there 
exists a joint process {X^Y) 1 -^ (X',T') with respect to MC satisfying that whose 
marginals are a faithful copy of MC and 



0<3(3<C V{X,y} G £, E[d(X',y')] < 0d{X,Y). 



Then the mixing rate r of Markov chain MC satisfies r < (1 — /3) ^ {1 + ln{D / d)) , 
where d min{d(x,y) | Vx, Vy G 0} and D max{d(x,y) | Vx, Vy G fl}. 



The above theorem differs from the original theorem in [5] since the integral- 
ity of the edge length is not assumed. We drop the integrality and introduce the 
minimum distance d. This modification is not essential and we can show Theo- 
rem 5.2 similarly. 



Now, we show the polynomiality of Algorithm 2. First, we estimate the mixing 
rate of our chain M by employing Path Coupling Theorem. In the proof of the 
following lemma. Condition 1 plays an important role. 

Lemma 5.3. Under Condition 1, the mixing rate r of our Markov chain M satisfies 
T < n{n — 1)^(1 + ln{nN)). 

Proof: Let G = (S, £) be an undirected simple graph with vertex set S and 
edge set £ defined as follows. A pair of vertices {X,Y} is an edge if and only if 
(1/2) ~T[1, j]| = 1. Clearly, the graph G is connected. For each edge 

e = {X, y} G £, there exists a unique pair of indices ji, j2 ^ {I5 • - • ? called the 
supporting pair of e, satisfying 

|x|ij]-r|i,,l| = { J 




182 



Shuji Kijima and Tomomi Matsui 



We define the length l{e) of an edge e by l{e) (l/(n — 1)) ~ 0 where 

j* = max{ji,j 2 } > 2 and {^ 1 ,^ 2 } is the supporting pair of e. Note that 1 < 
minee£ ^(e) < maxg^g /(e) < n/2. For each pair X^Y G S, we define the distance 

F) as the length of the shortest path between X and Y on G. Clearly, the 
diameter of G, i.e., max{d(X, F)}, is bounded by nN. The definition of edge length 
implies that for any edge {X, F} G £, d(X, F) = Z({X, F}). 

We define a joint process (X, F) (X', F') by (X, F) 1 -^ (0(X, A),0(F, A)) 
with uniform real random number A G [l,n), where (j) is the update function 
defined in Section 3. Now we show that 

E[d{X', r)] < mx, Y), /3 = l- l/(n(n - l)^), (4) 

for any pair {X, F} G £. In the following, we denote the supporting pair of {X, F} 
by {ji, j 2 }- Without loss of generality, we can assume that ji < j 2 ? and X[l, ^ 2 ] = 
F[l,j 2 ] — 1. In the following proof, we define ^Ox^Oy in a similar 

way as in the proof of Lemma 4.2. 

1. When [AJ = j 2 - 1, we show that E[d(X', F') | [AJ = j 2 ~ 1] < ~ 

(l/2)(n- j 2 + l)/(n-l). 

(a) In case that ji = j 2 ~l, X' = Y^ with conditional probability 1. Hence 
d(X',F') = 0. 

(b) In case that ji ^ j 2 — I and Ox — Oy. Condition 1 implies that 
Sj 2 -i ^ ^ 32 - Since Ox = Oy, ay > ax and by < bx, we have Ox = 
Oy = min{sj 2 -i, 5 ^ 2 } = Sj^. Thus we have 5 ^ 2-1 > ny > 

and so X'[l, j 2 “ 1] = (A— [AJ)^Xj F'[l, j 2 — 1] = ay — (A— [AJ)^y 

by the definition of the function (j). Then X'[l, j '2 ~ I] = F'[l, j ‘2 — 1] — 1 
since ax = ay — 1. Additionally, since X'[l, ^ 2 ] = o^x — X'[l, j '2 ~ 1] 
and F'[l, j2] = ay - F'[l, j2 - 1], we have X'[l, J2] = F'[l, ^2]- Hence 
d(X', F') = d(X, F) - (n - j 2 + l)/(n — 1) with conditional probability 
1. 

(c) In case that ji ^ j '2 — 1 and Ox Oy. Clearly \ 0 x - 0 y\ = 1. First, 

we discuss the case that Ox = Oy — 1. We only need to consider two 
cases, one is the case that [(A - [AJ)^xJ = [(A - L^J)^yJ and the 
other is that [(A - [AJ)^xJ = L(A — [AJj^yJ — 1. In the former case, 
we have X'[l, j 2 - 1] = ^'[Lj 2 - 1] and X'[l,jf 2 ] = - 1, 

and so d(X', F') = d(X, F). In the latter case, we have X'[l, j 2 ~ 1] = 
F'[l,i 2 - 1] - 1 and X'[l, j 2 ] = Y'[lj 2 l and so d(X', F') = d(X, F) - 
(n — j 2 + l)/(n - 1). These two cases appear with the same probability 
1 /2, hence E[d(X',r) | [AJ - j 2 - IJi ^ j 2 -l.6x =ey-l] = 
c!(X, F) — (l/2)(n — j 2 + l)/(n — 1). We can show the remained case 
that Ox = Oy 1 in di similar way. 

2. When [AJ = j 2 , we show that E[d(X',F') \ [AJ = J 2 ] < d{X,Y) + 
(l/2)(n- j 2 )/(n- 1). 

(a) In case that Ox = Oy^ we obtain the result that X'[l, J 2 ] = ^^[ 15 / 2 ]-! 
and X'[l, j 2 + 1] = ^^[ 15^2 + 1] in the same way as Case l-(b). Hence 
d(X',F') = d(X, F) with conditional probability 1. 

(b) Consider the case that Ox Oy. In di similar way with Cetse l-(c), we 
can show that d(X',F') = d(X, F) with conditional probability 1/2 
and d(X', F') = d(X, F) + (n— jf 2 ) / (n— 1) with conditional probability 
1/2. Hence E[d(X',F') | [AJ = j 2 ,^x ^ Oy] = d(X,F) + (l/2)(n - 

j 2 )/(n- 1). 




Sampling Algorithm for Contingency Tables 



183 



3. When [AJ ^ j 2 ~l and [AJ ^ j 2 , {A', T'} is also an edge of G. It is easy 
to see that j 2 = max{j(, J 2 } where {j[,j 2 } is the supporting pair of {X', 
Y'}. Thus we have d(A',F') = d{X,Y). 

The probability of appearance of Case 1 is equal to l/(n — 1), and that of Case 2 
is less than or equal to l/(n — 1). Prom the above, 



EMX ,y )] < i(x,Y) - — r- + 

1 



d(X,Y) 



= 1 - 



n — 1 2 n — 1 
1 (. 1 



2(n- 1)2 



< 1 



2{n-l)^max{x,Y}ee{d{X,Y)} 



d(X,Y) 



n(n — 1)^ 



d(A,y). 



Since the diameter of G is bounded by nW, Theorem 5.2 implies that the mixing 
rate r satisfies 



r < n(n — 1)^(1 + In(nA)). 



Next, we estimate the coalescence time. 



□ 



Lemma 5.4. Under Condition 1, the coalescence time of M satisfies E[Ti<] = 
O(nMniV). 

Proof: Let G = (S, £) be the undirected graph and d(A, T), VA,VF G S, be 
the metric on G, both of which are defined in the proof of Lemma 5.3. We define 

Dg d(Au, Al) and tq n{n - 1)^(1 + InDc)- By using the inequality (4) 
obtained in the proof of Lemma 5.3, we have 

Pr(r* > to) = Pr($\(Xu, A) ^ A)) 

= Pr($5»(Au,A)^$S“(^L,A)) 

< J2 rf(A,r)Pr(A = $S“(Au,A),y = $5 «(Al,A)) 

= E[d($S°(^u, A), A))] < (^1 - ^ rf(Au, Al) 

/ 1 \ n(n-l)^(H-lnL>G) -i 

\ n{n -lyj e 

By sub-multiplicativity of coalescence time ([18]), for any k G Z+, Pr(T^ > kro) < 
(Pr(T* > To))'' < (1/e)^ Thus 

00 

B[T*j = ^tPr(T* = t) < To + roPr(T* > tq) + roPr(T* > 2ro) H 

t=o 

£ "^0 + 6 + 'Ti/ + • • • = - — < 2 to. 

1 — 1/e 

Clearly, Dq < nN < because n < N. Then we obtain the result that E[T*| = 
0(^3 In A). □ 



Lastly, we determine the expected running time of Algorithm 2. 




184 



Shuji Kijima and Tomomi Matsui 



Proof of Theorem 5.1: We denote T* be the coalescence time of our chain. Note 
that is a random variable. Put K = [log 2 T*]. Algorithm 2 terminates when 
we set the starting time period T to —2^ at {K + l)st iteration. Then the total 
number of generated random numbers in Algorithm 2 is bounded by 2^ < 2T^ and 

the total number of simulated transitions is bounded by 2(2° + 2^ + 2^ H h 2^) < 

2 • 2 • 2^ < ST^. Under the assumption that we can generate a random number 
in constant time, each transition of a chain is simulated in constant time. Step 4 
of Algorithm 2, “Coalescence check,” requires 0{n) time. Thus the expectation of 
total time complexity is bounded by 0(E[2T^] +E[8T^] +E[A + l]n) = 0 (E[Xh]) = 
0(^3 In A). □ 



6. Discussions 

We proposed a perfect sampling algorithm for 2 x n contingency tables. Our algo- 
rithm is based on a new Markov chain which is monotone and rapidly mixing. The 
rapidity implies the polynomiality of our algorithm. More precisely, Algorithm 2 
produces exact samples from the uniform distribution, and the expected running 
time is bounded by O(n^lnA) under Condition 1. 

Our preliminary computational experience indicates that Condition 1 is an 
important requirement for rapidity. For example, when we set r = (500,500), 
s = (500,498, 1, 1) and executed monotone CFTP algorithm one thousand times, 
the average coalescence time was about 20. However, when we substituted s by 
(500,1,1,498) and executed the algorithm one thousand times, the average coales- 
cence time was about 2 millions. 

Even though Algorithm 2 is enough fast but needs to stock the random 
numbers. We can save the memory storage by using read once algorithm proposed 
by Wilson in [20]. For detail, please see [20, 13]. Note that the modified algorithm 
terminates in 0(n° In A) time and the required memory is bounded by 0(nln A). 

Our perfect sampling algorithm is applicable to the problem for counting 2xn 
contingency tables. In our previous work [16], we modified Dyer and Greenhill’s 
approximate counting scheme [10] , and estimated the size of bias of the expectation 
of approximate solution. When we employ our perfect sampling algorithm in the 
algorithm proposed in [16], we can show that the total number of required samples 
halved. 

It is easy to extend our monotone CFTP algorithm to 2 x • • • x 2 x J tables dis- 
cussed in [17]. We can show that the extended algorithm is also a polynomial time 
algorithm for uniform sampler. In case of conditional multinomial distributions, 
the existence of polynomial time perfect sampler remains open. 

Another remained major open problem is the existence of a monotone Markov 
chain for m x n contingency tables. 

References 

[1] A. Agresti, “A survey of exact inference for contingency tables,” Statistical Science^ 
7 (1992), pp. 131-153. 

[2] A. Agresti, Categorical Data Analysis, John Wiley Sz Sons, 2002. 

[3] D. Aldous, “Random walks on finite groups and rapidly mixing Markov chains,” 
in Seminarie de Probabilities XVII 1981/1982, vol. 986 of Springer- Verlag Lecture 
Notes in Mathematics, D. Bold and B. Eckmann, ed.. Springer- Verlag, New York, 
1983, pp. 243-297. 




Sampling Algorithm for Contingency Tables 



185 



[4] R. Bubley, Randomized Algorithms: Approximation, Generation, and Counting, 
Springer- Ver lag, New York, 2001. 

[5] R. Bubley and M. Dyer, “Path coupling : a technique for proving rapid mixing 
in Markov chains,” Proceedings of the 38th Annual Symposium on Foundations of 
Computer Science (FOCS 1997), pp. 223-231. 

[6] M. Cryan, M. Dyer, L. A. Goldberg, M. Jerrum, and R. Martin, “Rapidly mixing 
Markov chains for sampling contingency tables with constant number of rows,” Pro- 
ceedings of the 43rd Annual Symposium on Foundations of Computer Science (FOCS 
2002), pp. 711-720. 

[7] P. Diaconis and B. Effron, “Testing for independence in a two-way table: new inter- 
pretations of the chi-square statistic (with discussion),” The Annals of Statistics, 13 
(1985), pp. 845-913. 

[8] P. Diaconis and L. Saloff-Coste, “Random walk on contingency tables with fixed 
row and column sums,” Tech, rep.. Department of Mathematics, Harvard University, 
1995. 

[9] X. K. Dimakos, “A guide to exact simulation,” International Statistical Review, 69 
(2001), pp. 27-48. 

[10] M. Dyer and C. Greenhill, “Polynomial-time counting and sampling of two-rowed 
contingency tables,” Theoretical Computer Sciences, 246 (2000), pp. 265-278. 

[11] M. Dyer, R. Kannan, and J. Mount, “Sampling contingency tables,” Random Struc- 
tures and Algorithms, 10 (1997), pp. 487-506. 

[12] R. A. Fisher, “The logic of inductive inference (with discussion),” Journal of Royal 
Statistical Society, 98 (1935), pp. 39-54. 

[13] O. Haggstrom, Finite Markov Chains and Algorithmic Application, London Mathe- 
matical Society, Student Texts 52, Cambridge University Press, 2002. 

[14] D. Hernek, “Random generation of 2 x n contingency tables,” Random Structures 
and Algorithms, 13 (1998), pp. 71-79. 

[15] M. Jerrum and A. Sinclair, “The Markov chain Monte Carlo method: an approach 
to approximate counting and integration,” in Approximation Algorithm for NP-hard 
Problems, D. Hochbaum, ed., PWS, 1996, pp. 482-520. 

[16] S. Kijima and T. Matsui, “Approximate counting scheme for m x n contingency 
tables ,” lEICE Transactions on Information and Systems, vol. E87-D (2004), pp. 
308-314. 

[17] T. Matsui, Y. Matsui, and Y. Ono, “Random generate of 2 x • • • x 2 x J contingency 
tables,” METR 2003-03, Mathematical Engineering Technical Reports, University of 
Tokyo, 2003. (available from http://www.keisu.t.u-tokyo.ac.jp/Research/techrep.O 
.html) 

[18] J. Propp and D. Wilson, “Exact sampling with coupled Markov chains and appli- 
cations to statistical mechanics,” Random Structures and Algorithms, 9 (1996), pp. 
232-252. 

[19] J. Propp and D. Wilson, “How to get a perfectly random sample from a generic 
Markov chain and generate a random spanning tree of a directed graph,” J. Algo- 
rithms, 27 (1998), pp. 170-217. 

[20] D. Wilson, “How to couple from the past using a read-once source of randomness,” 
Random Structures and Algorithms, 16 (2000), pp. 85-113. 



Shuji Kijima 

Department of Mathematical Informatics, Graduate School of Information Science 

and Technology, University of Tokyo 

kijima@misojiro.t.u-tokyo.ac,jp 




186 



Shuji Kijima and Tomomi Matsui 



Tomomi Matsui 

Department of Mathematical Informatics, Graduate School of Information Science 

and Technology, University of Tokyo 

ht tp : / / WWW. simplex, t . u- tokyo .ac.jp/"' tomomi / 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



An Efficient Generic Algorithm for the 
Generation of Unlabelled Cycles^ 

Conrado Martinez and Xavier Molinero 



ABSTRACT: In this paper^ we combine two recent generation algorithms 
to obtain a new generic algorithm for the generation of all unlabelled cycles with 
components from some class A and total size n. Sawada’s algorithm [13] lists all 
k-ary unlabelled cycles with fixed content, that is, the number of occurrences of 
each symbol is fixed and given a priori. The other algorithm [8], by the authors, 
generates all multisets of objects with given total size n from any admissible un- 
labelled class A. By admissible we mean that the class can be specified using the 
e-class, atomic classes, disjoint unions, products, sequences, (multi)sets, etc. The 
resulting algorithm, which is the main contribution of this paper, generates all 
cycles of objects with given total size n from any unlabelled admissible class A. 
Given the generic nature of the algorithm, it is suitable for inclusion in combi- 
natorial libraries and for rapid prototyping. The new algorithm incurs constant 
amortized time per generated cycle, the constant only depending in the class A to 
which the objects in the cycle belong. 



1. Introduction 

The generation of unlabelled cycles (also called necklaces) probably poses the 
most difficult problems if we compare them with the generation of other common 
combinatorial constructs (see for instance [4, 10, 11]). But in the last few years 
there has been several notable progress related to the generation of necklaces, as 
witnessed by the work of Wang and Savage [17], Ruskey and Sawada [12, 14], 
Cattell et al. [1] and Sawada [13]. 

Recall that a necklace or unlabelled cycle is a sequence of symbols such that 
it is lexicographically smaller than any of its circular permutations. Thus, abadc is 
a cycle but adcab is not. Of course, we assume a certain order among the symbols 
in order to properly define the notion of cycle. When a cycle is aperiodic it is called 
a primitive cycle or Lyndon word; thus the cycle abadc is a Lyndon word, but the 
cycle abab is not. 

In our recent works [6, 8, 9] we have shown how to design general algorithms 
to iterate through all the objects of a given size in labelled and unlabelled ad- 
missible combinatorial classes, such as those constructed using disjoint unions, 
products, sets and multisets, sequences, substitutions, etc. We use the adjective 
general above in the sense that the algorithms receive as their input both the size 
n of the objects and a finite specification of the combinatorial class which the 
objects belong to. In this paper we present an iteration algorithm for unlabelled 



^This research was supported by the Future and Emergent Technologies programme of the EU 
under contract IST-1999-14186 (ALCOM-FT) and the Spanish “Ministerio de Ciencia y Tec- 
nologia” programme TIC2002-00190 (AEDRI II). 




188 



Conrado Martinez and Xavier Molinero 



cycles. Contrary to the other algorithms that we have designed so far, the algo- 
rithm for unlabelled cycles is not based upon a suitable recursive decomposition 
of this combinatorial construction; hence, it cannot be used as the basis for an 
efficient unranking algorithm (that is, given a rank i and a size n, generate the 
z-th object of size n) for unlabelled cycles, which remains still as an open problem. 

The framework presented in [6, 8, 9] follows the approach that was pioneered 
by Flajolet et al. [3] for the random generation of combinatorial objects, and later 
applied by the authors to the unranking of combinatorial objects [7]. Together with 
the present paper, these papers show the viability of this elegant approach for an 
effective and efficient solution of the ‘Tig four” : counting, random generation, ex- 
haustive/iterative generation and unranking. We have been able to prove that all 
the proposed iteration algorithms for decomposable objects (sets, sequences, la- 
belled cycles, . . . ) run in constant amortized time (CAT) per generated object, 
provided that the class can be finitely specified using e-classes (a single object 
of size 0), atomic classes (a single object of size 1), disjoint unions, products, 
sequences, substitutions, sets and multisets. We also provided there a CAT algo- 
rithm for labelled cycles, but unlabelled cycles eluded our efforts. The proposed 
algorithms do not perform better than state-of-the-art algorithms, but do not per- 
form much worse either. And because of their general flavor, they are useful for 
rapid prototyping and for their inclusion into general combinatorial libraries like 
combstruct for Maple or MuPAD-combinat for MuPAD. 

This paper is organized as follows. In Section 2 we review a few basic concepts 
and notation. Sections 3 and 4 present our algorithm, which is based upon our 
algorithm for unlabelled multisets in [8] and the recent algorithm to generate 
unlabelled cycles of fixed content of Sawada [13]. We prove that the resulting 
algorithm has good performance, namely, it is constant amortized time. Finally, 
in section 5 we discuss our current and future work on this topic and report on 
our preliminary implementation of the algorithm. 



2. Preliminaries 

In this paper we consider unlabelled admissible classes of combinatorial structures 
and in particular, unlabelled cycles. Most of the material in this section is standard 
and can be found elsewhere, see for instance [2, 15, 16]. However, to make the 
paper more self-contained and to fix notation, we will briefly introduce some basic 
definitions and concepts. We begin with the formal definition of a combinatorial 
class. 

Definition 2.1. A combinatorial class is a pair (A, | • |) such that A is a finite or 
infinite denumerable set and | • | : A ^ N Z5 a size function such that, for all n > 0, 
An = {a ^ A\\a\ = n] is finite. 

Shall no confusion arise, we will use the same name for the class and for the 
set of objects belonging to that class. Also, we use subscripts under a class name 
to denote the subset of objects of that class with a given size. Typically, complex 
objects in a given class are composed of smaller units, called atoms and generically 
denoted by Z. Atoms are objects of size 1 and the size of an object is the number 
of atoms it contains. For instance, a string is composed by the concatenation of 
symbols, where each of these is an atom, and the size of the string is its length 
or the number of symbols it is composed of. Similarly, a tree is made up of nodes 
— its atoms — and the size of the tree is its number of nodes. Objects of size 0 are 
normally denoted by e. 




Generation of Unlabelled Cycles 



189 



Two main types of combinatorial classes can be defined depending on whether 
the atoms that compose a given object can be distinguished or not. In the former 
case, we say the class is labelled whereas in the later we say the class is unlabelled. 

As it will become apparent, an efficient solution to the problem of counting, 
namely, given a specification, a class and a size, compute the number of objects of 
the given size, is fundamental for our solution of the iteration problem. Hence, we 
turn our attention to admissible combinatorial classes^ that is, those classes that 
can be constructed from admissible constructors. An admissible constructor is an 
operation over classes that yields a new class, and such that the number of objects 
of a given size in the new class can be computed from the number of objects of 
that size or smaller sizes in the constituent classes. 

In order to formalize the notion of admissibility, we need the fundamental 
notion of counting generating functions. 

Definition 2.2. The (counting) generating function of an unlabelled combinatorial 
class A is the ordinary generating function for the sequence {an}n>o- That is, 

n>0 qG/L 

where On = 4(An is the number of objects in A of size n. The n-th coefficient of 
A{z) is denoted [z'^]A{z), i.e., an = [z'^]A{z). 

Now we can define admissible operators. 

Definition 2.3. An operation over combinatorial classes yii,yi 2 , . . . , is admis- 
sible if and only if there exists some operator $ over the corresponding generating 
functions Ai(z),. . . , Ak(z) such that 

e-^^(Ai,...,A;,) C{z) = ^Ai{z),...,Ak{z)) 
where C{z) is the generating function of G. 

Finally, we can define admissible specifications. 

Definition 2.4. An admissible specification S is a collection of equations of the 
form 

where no two equations have the same left-hand side, each is an admissible 
operation, and each A^-^ is either an e- class, an atomic class, or there is an equa- 
tion in the collection with that class as its left-hand side. An e- class is a class that 
contains a single object of size 0. Each of the classes that appear in the left-hand 
sides of the equations in S is said to be specified by S. 

If a class A is specified by an admissible specification, then the class itself 
is called admissible. Admissible classes are also called decidable or well-founded 
classes [18]. Table 1 lists some admissible operators and the corresponding ordinary 
generating functions (OGFs). 

Though the collection of operations given above is small, it can be used to 
describe many important and useful combinatorial classes. 

Admissible classes must have a finite number of objects for any size, i.e., 
On < oo for any n G N. Hence, some restrictions over the classes are needed. For 
instance, Seq(A) and Set(A) require that oq — 0; and, Subst(A,!B) requires that 
either = 0 or A is finite. 




190 



Conrado Martinez and Xavier Molinero 



Class 


OGF 


Union(yi, ®) = A-\-'B 


A{z) + B{z) 


Prod (A, 3) = A X 


A{z}-B{z) 


Seq (A) 


1 

1-A{z) 


PowerSet (A) 


exp(E(-l)”-'^) 

\n>0 J 


Set (A) 


-p(eT^) 

\n>0 / 


Cycle(A) 




Subst(A,®) 





Table 1. Unlabelled admissible combinatorial operators 



Prom now on, by an admissible class we mean that the class can be finitely 
specified using the e class (the class with a single object of size 0), atomic classes 
(classes that contain a single object of size 1), disjoint unions (Union or prod- 
ucts (Prod or ’x’), sequences (Seq), multisets (Set), power sets (PowerSet) and 
cycles (Cycle) of admissible classes. Furthermore, the techniques and results pre- 
sented can be easily extended to variants of the cycle operator which are admissible 
operators, in particular, to cycles with a restricted number of components. As we 
have already mentioned in the introduction, we have developed in previous works 
efficient algorithms to cope with all the combinatorial operators mentioned in the 
list above, except for unlabelled cycles (Cycle). In this paper, we fill that gap. 



3. Generating Cycles: The Fundamentals 

Let A be some admissible unlabelled class and assume that we already have an 
efficient procedure next to list all objects of a given size n in A, one at a time. The 
(very) high-level pseudocode of the algorithm is given in Figure 1. There p points 
to the root of the tree-like data structure that represents the current combinatorial 
object; in particular we assume that the current object is an unlabelled cycle of 
A’s. Each node in the data structure contains a field size and a field class that 
contain the size and the class of the object represented by the subtree rooted at 
the given node, respectively. Similarly, there is a field count and a field rank on 
each node that gives the number of objects in the same class with the same size, 
and the rank of the object within its class. The nodes that represent cycle objects 
have additional fields, including the fields called cycle-rank and cycle-Count^ which 
give the rank of the cycle within the set of cycles that can be built using the same 
components as the current cycle and the number of cycles in that set. For instance, 
the root node of the tree representing the cycle 

c = (6, aa, aaa, 6a, aca, aa, 6, bb) 

has cycle-count = 1260 , since that many different cycles can be constructed using 
the same components as c. The notation p field denotes the field named field 
in the node pointed to by p. 

In order to generate all the cycles using the components of the basis multiset 
7 we need to introduce some order among them. If two components ai and 02 have 
the same size then their order is given by respective ranks in A (i.e., the order 





Generation of Unlabelled Cycles 



191 



// p points to the root of the tree representing a cycle of A^s 
function next_cycle(p) 
if p -> rank = p -> count then 

// the object is the last possible cycle; just signal the end 
p -> is_last:= true 
retinrn 
endif 

if p “> cycle_rank = p -> cycle_count then 

// no more cycles can be built with the same 
// (multi) set of components; 

// generate the next multiset 
p:= next jnulti set (p) 

// set up for cycle generation 
return init_cycle_from_multiset (p) 
endif 

// Use Sawada^s algorithm to generate the next cycle with the same 
// multiset of components 
return next_cycle_from_multiset (p) 
endfunction 



Figure 1. The generation of unlabelled cycles of yi’s. 



in which they are listed by the known next procedure). Otherwise, the object 
of smaller size is considered smaller than the other object. We write ai :< U2 to 
denote that ai is smaller than or equal to a2, in the sense above. 

On the other hand, the order of multisets could be similarly defined. Let 

7 = {ao*no,ai*no,...,afc_i*nfc_i}, 

7' = {a^ • no, a'l a'_i • 

where the notation a • n indicates that the object a appears n > 0 times in the 
multiset^. Assume, furthermore, that each multiset is presented in sorted order, 
namely, uq ^ ai ^ ^ cik-i, RRd similarly 7'. Then the relative order of 7 and 7' 

follows from the obvious lexicographic order; thus 7 7' if they have a common 

“prefix” of length i — 1 and either -< or and ri£ > n[. 

Notice that the procedure next_cycle will generate all cycles of size n in 
Cycle(yi) if we find a way to solve its two main steps; however, the cycles will 
not be generated in lexicographic order. For instance, take the basis multisets 
7 rr {a • 3, 6, c} and 7' == {a • 2, 6 • 3}, with 7 :< 7'. Hence the cycle aacab will be 
generated before than the cycle aabbb, yet the latter is lexicographically smaller 
than the former. If we defined the order between multisets in such a way that 
7' 7, we would produce ababb before aaabc, but the second is lexicographically 

smaller than the first, so the problem is inherent to the structure of our algorithm, 
not to the way we have defined the order of multisets. 

Another key ingredient of the algorithm is a suitable representation of cycles, 
so that next_cycle can be efficiently implemented. The representation of cycles 
is conditioned by our already existing framework for the generation of other com- 
binatorial objects; some modifications need to be introduced into that framework 
in order to accommodate the generation of cycles. 

^We will usually omit the notation •n if n = 1, though. 




192 



Conrado Martinez and Xavier Molinero 



We use binary trees to represent combinatorial objects. We also have a special 
pointer to mark the last updated node in the object to further speed the generation 
of the successor of the object. We call the resulting data structure an iterator. 
Leaves in the binary tree contain the atoms and e components of the object; the 
internal nodes are labelled by operators: ’x’, etc. Each internal node contains 

information about the subobject represented by the corresponding subtree. In 
particular, each node holds the size of the subobject (size), the specification of 
the class that the object belongs to (class), the rank of the object within its class 
(class), and the number of objects of that size in the class (count). Additional fields 
are used for some particular nodes, as we describe in next section. Furthermore, 
each node has pointers to its children and to its parent. 

In order to be able to cope with the nested generation of cycles, a ’cycle’ 
node is similar to a ’multiset’ node and the object represented by the subtree 
beneath a ’cycle’ is actually the representation for what we have called the basis 
multiset; the ’cycle’ node contains also an auxiliary structure that establishes how 
the components of the basis multiset are arranged in the current cycle. While 
generating all the cycles with a common basis multiset, the subtree below the 
’cycle’ node gets untouched, only the auxiliary structure is modified. When all 
the cycles with the same basis multiset have been generated, the algorithm to 
generate the next multiset is applied to the ’cycle’ node as if it were a ’multiset’ 
node. Afterwards, we apply init_cycle_from_multiset to update the specific 
auxiliary structure associated to the ’cycle’ node. 



4. Generating Cycles: The Details 

In this section we give a detailed description of the two components of our al- 
gorithm. Also, we have a close look at the representation of multiset and cycle 
objects. 

We consider first the representation of multisets and how they are generated. 
The generation of multisets is based in the following useful recursive decomposi- 
tion: 

n 1 

Set(yl)„ = uu u Set(Aj,card = p)j.p x Set(Ayj)n-j-p (1) 

n>0j=l P= [n/jj 

where Set(yij,card = p) is the class of sets with exactly p components which are 
elements of the class A of size j, and Set(A>j) is the class of sets whose components 
are objects in A with size strictly larger than j. 

Therefore (1) tells us that a multiset consists of a multiset of p components 
of minimal size, say j, times a multiset of components of larger size. The first 
part of the multiset is what we call a multiset block or mblock for short. In our 
representation, a ’multiset’ node has a left son of size £ whose root is a ’mblock’ 
node and a right son of size n — £ whose root is a ’multiset’ node. To compute 
the next multiset, the algorithm next .multiset is recursively applied in the right 
son; but if no more multisets of size n — £ can be generated, then we apply the 
next unblock algorithm to the left son and initialize the right son with the smallest 
multiset of size n — £ that does not contain components of size smaller or equal 
to the size of the objects in the ’mblock’ node. In order to make this task easier, 
each ’multiset’ node carries an additional field minsize which stores the minimal 
possible size among its components. If there is no ’mblock’ object following the 
one represented by the current left son, then we initialize both the left son and 




Generation of Unlabelled Cycles 



193 



size/min_size 




Figure 2. Internal representation of a Set-object. 



the right son with appropriate ’mblock’ and ’multiset’ objects of size and n — ^', 
respectively. We increase the minimal size j for the components of the multiset, 
and initialize a mblock as large as possible, together with a (eventually empty) 
multiset whose components are of size strictly larger than the (new) j. 

A ’mblock’ node represents a set of p components which belong to the same 
class and have all them the same size j. The mblock recursively decomposes into its 
component of smallest rank, called its leader^ occurring a certain number of times, 
say r, followed by a mblock with p — r components whose ranks are strictly larger 
than r. Thus a ’mblock’ node has a left son which is a ’delta’ node representing 
the smallest component in the mblock and its number of occurrences, and a right 
son which is a mblock itself. Each mblock node has an additional field miri-rank 
which ranges between 0 and aj ~ 1 (the number of objects in A of size j) and 
indicates the minimal possible rank for any component of the mblock. 

Figure 2 schematically depicts the internal representation of the multiset 

{aa, aa, 6a, a6a, aba, abb, 666, abaa, bbba, 666aa6} 

= {aa • 2, 6a, aba • 2, abb, 666, abaa, bbba, 666aa6} G Set(Seq(a +&))■ 

The ’multiset’ nodes are represented by oval nodes, together with the values of the 
size and minsize attributes. Nodes that represent ’mblocks’ are represented by 
circular nodes with thick border; within each ’mblock’ node we write the values 
of the size and miri-rank attributes. The components of the multiset belong to 
Seq(a + 6) but we have shown them “collapsed” into single rectangular nodes. 
Additionally, the diamond-shaped nodes represent ’delta’ nodes, and store in the 
field occurrences the number of times that the particular component occurs in the 
multiset. 



194 



Conrado Martinez and Xavier Molinero 



Computing the successor of a mblock follows the same pattern as the com- 
putation of the successor of a multiset: given a mblock m = (c, m') we recursively 
try to compute the successor of the mblock m' (represented by the right subtree 
of m), while maintaining the same leader; if this where not possible, we compute 
the successor of the leader c and the first mblock with components whose rank is 
larger that the rank of the new leader replaces m'. 

The computation of the next leader (a ’delta’ object) is also quite easy. Sup- 
pose that the current ’delta’ object is a • of size ^ = \a\ • k. The ’delta’ node 
has a field occurrences to carry the information about the number of repetitions 
k of the object a, which is represented in the left subtree. The right subtree of 
the ’delta’ node is simply discarded. If a is not the last object in A of its size and 
A: > 0, then the next ’delta’ object is a • A: — 1 ; but if A: = 0 then the next object 
if a' • n/j where a' is the next object of a in yi and n is the size of the mblock. 

With this scheme, multisets are not generated in lexicographic order, but as 
we have discussed, this is not very relevant as the cycles can’t be generated in 
lexicographic order either. 

Provided that all objects of a given size in A can be generated in time pro- 
portional to the total number of generated objects, then it can be shown that the 
algorithm above generates all multisets of size n of A^s in time proportional to the 
total number of generated multisets [ 8 ] . 



We use Sawada’s algorithm [13] to generate all the cycles with a given basis 
multiset. This recent algorithm is able to produce all possible cycles that contain 
ro occurrences of the symbol 0, fi occurrences of symbol 1, etc., i.e., all A;- ary 
cycles of fixed content In our algorithm, the components of the basis multiset are 
the “symbols” to work with. 

To start the generation of cycles, we first initialize the basis multiset 7 = 
{ao • ro, . . . , < 2 /c-i • of the ’cycle’ object. This is done, except the first time, 

by applying next -multiset algorithm to the root node of the ’cycle’ object. 

Then the procedure init_cycle_f rom_multiset performs the following tasks. 
First, it prepares a list L with pointers to the ’delta’ objects • r^ and initializes 
an array R with the number of repetitions r^ of each ’delta’ object. Both the list 
L and the array R are part of the auxiliary structure attached to the ’cycle’ node. 
Because of the requirements of Sawada’s algorithm the list L should be filled in re- 
verse order (the first element in the list points to Ok-i and so on) and then L should 
be rearranged to make sure that the first element in L points to the component 
with the largest value r^. Of course, the array R should be rearranged accordingly. 
Sawada’s algorithm requires that symbols are indexed to make sure that the largest 
symbol in lexicographic order has the largest number of occurrences, in order to 
generate all cycles in (reverse) lexicographic order. Although it would be not too 
difficult to adapt our algorithm so that the list L and R are arranged as Sawada’s 
algorithm dictates, and undo both the reversing and re-indexing when “printing” 
the current object, we do not take any further steps on this respect, because we 
will not be able to generate all cycles of A’s in lexicographic order anyway. So the 
initialization of L and R is more simple in our setting. 

We also have to initialize the rank of the current cycle to 0 {cycle^rank) 
and compute the number of cycles {cycle-count) that can be generated with that 
particular basis multiset [5]: 



C(iVo,...,iV,_i) 



1 

N 



Y. 



(t>U) 



j\ gcd(No,...,Wfc_i) 



Wj)\ 




Generation of Unlabelled Cycles 



195 



where (j){x) is Euler’s totient function (the number of prime divisors of x) and 

N = Nq-\- Ni~\ h Nk-i (observe that N is not necessarily the size of the cycle, 

as the components are not of size 1, in general). Finally, the ’cycle’ node contains 
also an array V of pointers that gives the arrangement of the components within 
the current cycle, a field arity with the number of distinct components that appear 
in the basis multiset and a field length with the number of components of the cycle. 
All this additional information, in particular L and R, can be easily initialized 
while initializing the basis multiset, introducing a few minor modifications in the 
corresponding algorithm. The array V plus the length field might be though as 
the real representation of the cycle, while the other fields are just necessary for 
the computation of the successor. 

With some effort, Sawada’s algorithm can be transformed into an iterative 
version which is more convenient to our purposes, although more involved than the 
original recursive version. The algorithm works by appending symbols to suitable 
prefixes; in order to avoid linear searches of the available symbols, the global list 
L of “available” symbols is used. Also, in the transformation from its original 
recursive formulation to the iterative version, we have to include additional fields 
in the ’cycle’ node to keep record of the current state of computation. In particular, 
we need a field longesLLyn-pref with the length of the longest Lyndon prefix of 
the current cycle. 

Recall that when all the cycles with the same basis multiset are exhausted, 
the next -multiset algorithm is applied to obtain a new basis multiset; the list L, 
the array N and the other fields of the auxiliary structure are updated accordingly. 
In principle, updating the array R and the list L would need some non-constant 
amount of work. But when Sawada’s algorithm ends, both L and R have recovered 
their initial contents. Again a minor modification of the next .cycle algorithm will 
allow us to make no more changes to update R and L than to generate the new 
basis multiset (in the amortized sense). So to speak, most of the times only one or 
two components of the multiset will be modified and thus only a constant amount 
of work will be necessary to update R and L. 



Sawada’s algorithm generates the C{No , . . . , Nk~i) cycles in constant amor- 
tized time per cycle. Denote ^(7) the cost of generating all the cycles with basis* 
multiset 7 and ^(7) the set of cycles with basis multiset 7. Thus the cost Cn of 
our algorithm is given by: 

C„ = c-#Set(yi)„+ (C'(7) + c') 

7GSet(yi)n 

= (c + c') • #Set(yi)n + ^ ^ c" 

7GSet(yi)n v^d{‘y) 

= c'" • #Set(A)^ + c" • #Cycle(A)n, 

where c, c', c" and c'" are constants. The constant c is implied by our CAT 
algorithm for multisets; the constant c" is implied by Sawada’s algorithm for cycles 
of fixed content, and c' is the constant amount of work that needs to be done to 
update the auxiliary structure attached to the corresponding ’cycle’ node each time 
that we move from a basis multiset onto the next one. Since Set(yi)n < Cycle(yi)n 
if n > 0, it follows that Cn is bounded by a constant times the number of generated 
objects, and thus it is a CAT algorithm too. 




196 



Conrado Martinez and Xavier Molinero 



5. Final Remarks 

We have already conducted a few experiments with a preliminary implementation 
of the algorithm described in this paper in Maple with good results. In particu- 
lar, we have used the class iNf = Cycle (Set(Z, card > 1)) (cycles of integers) for our 
experiments. For instance, all cycles of integers of total size 25 — there are 1342183 
such cycles — are generated in 2269 seconds (0.0016911 seconds/cycle) using a ma- 
chine equipped with a Pentium processor at 1.7 GHz. 

Although the basic ideas behind the algorithms (including the algorithm for 
multisets and Sawada’s algorithm) are rather simple, the implementation details 
are not. This is because the new algorithm for cycles operates rather differently of 
the other generation algorithms with which it should be integrated (the algorithm 
for multisets is prototypical in that respect, with a nice recursive decomposition 
guiding the algorithm’s operation). A suitable recursive decomposition of cycles 
would allow us to design an iteration algorithm which fits better the framework 
developed in previous works. Also, such an algorithm would be more amenable 
to a precise analysis of its performance. Last but not least, if such a recursive 
decomposition were obtained, an efficient algorithm for the unranking of cycles 
would suggest itself. Unfortunately, so such decomposition for unlabelled cycles 
seems to exist. 

Other related questions that we are now investigating include variants of the 
operators (for instance, cycles with restrictions on the number of components) and 
minor variations of the order in which objects are generated which could improve 
the overall performance of the process. 



References 

[1] Cattell, K., Ruskey, F., Sawada, J., and Serra, M. Fast algorithms to gen- 
erate necklaces, unlabeled necklaces, and irreducible polynomials over GF(2). J. 
Algorithms 37 (2000), 267-282. 

[2] Flajolet, P., and Sedgewick, R. The Average Case Analysis of Algorithms: 
Counting and generating functions. Tech. Rep. 1888, INRIA, April 1993. 

[3] Flajolet, P., Zimmerman, P., and Cutsem, B. V. A calculus for the random 
generation of combinatorial structures. Theoret. Comput. Sci. 132y 1-2 (1994), 1-35. 

[4] Fredricksen, H., and Kessler, I. An algorithm for generating necklaces of beads 
in two colors. Discrete Mathematics 61 (1986), 181-188. 

[5] Gilbert, E., and Riordan, J. Symmetry types of periodic sequences. Illinois J. 
Mathematics 5 (1961), 657-665. 

[6] MartInez, C., and Molinero, X. Generic algorithms for the exhaustive generation 
of labelled objects. In Proc. of the 4th Workshop on Random Generation of Combi- 
natorial Structures and Bijective Combinatorics (GASCOM’Ol) (2001), pp. 53-58. 

[7] MartInez, C., and Molinero, X. A generic approach for the unranking of labelled 
combinatorial classes. Random Structures & Algorithms 19, 3-4 (2001), 472-497. 

[8] Martinez, C., and Molinero, X. Generic algorithms for the generation of combi- 
natorial objects. In Proc. of the 28th Int. Symposium on Mathematical Foundations 
of Computer Science (MFCS) (2003), B. Rovan and P. Vojtas, Eds., vol. 2747 of 
Lecture Notes in Computer Science, Springer- Verlag, pp. 572-581. 

[9] Martinez, C., and Molinero, X. Efficient iteration in admissible combinatorial 
classes. Theoret. Comput. Sci. (2004). Submitted. 

[10] Nijenhuis, a., and Wile, H. S. Combinatorial Algorithms. Academic Press, 1978. 




Generation of Unlabelled Cycles 



197 



[11] Ruskey, F., Savage, C., and Wang, T. Generating necklaces. J. Algorithms 13 
(1992), 414-430. 

[12] Ruskey, F., and Sawada, J. A fast algorithm to generate unlabeled necklaces. In 
Proc. of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 
(2000), pp. 256-262. 

[13] Sawada, J. A fast algorithm to generate necklaces with fixed content. Theoret. 
Comput. Sci. 301, 1-3 (2003), 477-489. 

[14] Sawada, J., and Ruskey, F. An efficient algorithm for generating necklaces with 
fixed density. In Proc. of the 10th Annual ACM-SIAM Symposium on Discrete Al- 
gorithms (SODA) (1999), pp. 752-758. 

[15] Sedgewick, R., and Flajolet, P. An Introduction to the Analysis of Algorithms. 
Addison- Wesley, Reading, MA, 1996. 

[16] ViTTER, J., AND Flajolet, P. Average-case analysis of algorithms and data struc- 
tures. In Handbook of Theoretical Computer Science, J. van Leeuwen, Ed. North- 
Holland, 1990, ch. 9. 

[17] Wang, T., and Savage, C. A Gray code for necklaces of fixed density. SIAM J. 
Discrete Math. 9, 4 (1996), 654-673. 

[18] ZiMMERMANN, P. Series generatrices et analyse automatique d’algorithmes. PhD 
thesis, Ecole Poly technique, March 1991. 

Conrado Martinez and Xavier Molinero 

Departament de Llenguatges i Sistemes Informatics 

Universitat Politecnica de Catalunya 

E-08034 Barcelona, Spain. 

{conrado , molinero }(91si .upc . es 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Using Tries for Universal Data Compression 

Yuriy A. Reznik and Anatoly V. Anisimov 



ABSTRACT: We show how a digital tree (or triej structure can be used 
for both parsing and encoding (in a Variable-Length to Block (VB) or Variable- 
Length to Variable-Length (VV) fashion) of sequences of symbols from a stochastic 
source. As an example, we construct a simple VB code based on a fixed database 
adaptation model, and derive an asymptotic expression for its average redundancy 
rate for memoryless sources. 



1. Introduction 

Given a set of n distinct strings S = {si, . . . , from a binary alphabet A = 
{0, 1}? O' [1] T (5) can be constructed recursively as follows. If n == 0, the trie 
is empty. If n = 1, the trie is an external node (or leaf) containing a pointer to a 
string in 5. If n > 2, the trie is an internal node, containing pointers to two child 
tries: T (5q) and T (Si), constructed from suffixes of strings in S that begin with 
symbols 0 and 1 correspondingly: Sa = {s'j \ a Sj = sj , a G A, sj G S'} . 

In a special case, when strings si,. . . ,Sn represent successive suffixes of an 
input sequence Z — si = ziS 2 = ... — z\ ... Zi E A, the resulting structure 

T {Z) is called a suffix tree (see Fig. l.a). 

It is well known, that the paths from the root to external nodes in a trie 
represent shortest distinguishing prefixes of strings in S. However, these prefixes 
may not form a complete prefix set (e.g., a trie in Fig. l.a lacks prefixes Oil and 11). 
This problem can be trivially solved by adding the missing nodes, or equivalently, 
by extending the set of strings in a trie to a (smallest possible) complete prefix 
set. We illustrate this procedure on Fig. l.b, and call the resulting structure an 
extended trie (or extended suffix tree). 

In this abstract we propose to use extended tries for construction of non-block 
(VB or VV structured) source codes. As an example, we develop an algorithm 
based on a fixed- database model of Wyner and Ziv [2]. 

Both encoder and decoder have access to a training sequence (or database) Z 
produced by the source. This sequence is used to construct an extended suffix tree 
T (Z), which, in turn, is used to define a VB code for this source (see Fig. l.c). 
When encoder receives a message, it uses the tree T(Z) to parse the message into 
words xi,...,XjTj (corresponding to paths in T{Z)), indices of which are then 
transmitted using [log 2 |T|] bits, where |T| is the total number of leaves in T (Z). 

2. Main Result 

Theorem 2.1. Consider a memory less source over a binary alphabet A = {0, 1} 
with probabilities p = Pr {0}, q = Fr {1} , p V q = I- Let database Z be a sequence 
of n symbols produced by this source. Then, the average redundancy rate of 




200 



Yuriy A. Reznik and Anatoly V. Anisimov 




(a) A trie built from 7 suffixes of Z=0000 10100... (b) An extended version of the same trie, (c) Extended trie used as a VB code for a Bernoulli source. 



Figure 1 . A suffix tree (a), its extended version (b), and the 
corresponding VB code for a memoryless source (c) . 



a VB code based on an extended suffix tree T{Z) is (with n 

<* = E 1 I - + o 

l d{T) j logn 



> ooj: 
1 

log^ n 



( 1 ) 



where: d{T) = Pr is the average depth of a tree T{Z), 6i (n^p) is 

a fluctuating function of a small amplitude, and C\ is a constant, which exact 
formula is 

Ci=h ^loge (1 + 7 ) + h~^+log , 

where: h = —plogp — qlogq is the entropy of the source, /12 = plog^ p + ^log^ q, 
and 7 = 0.57721 ... is the Euler’s constant. All logarithms have the same base, 
reflecting the unit of information (e.g. bits or nats) being used. 



In other words, we can show (and based on [3], it should be possible to prove 
it for Markovian sources as well) that the average redundancy of our code is de- 
creasing at a rate of which is a noticeable improvement over a 

rate of the fixed-database version of a Lempel-Ziv code [2]. While a similar im- 
provement has already been reported for a code using two training sequences [4] , 
the present algorithm should be more practical, as it requires only one sequence. 



References 



[ 1 ] D. Knuth, The Art of Computer Programming. Sorting and Searching. Vol. 3 
(Addison- Wesley, Reading MA, 1973). 

[2] A.D. Wyner and J. Ziv, Fixed Data Base Version of the Lempel-Ziv Data Compres- 
sion Algorithm, IEEE Trans. Inform. Theory, 37 (1991) 878-880. 

[3] P. Jacquet, and W. Szpankowski, Analysis of digital tries with Markovian depen- 
dency, IEEE Trans. Inform. Theory, 37 (1991) 1470-1475. 

[4] A.D. Wyner and A.J. Wyner, Improved Redundancy of a Version of the Lempel-Ziv 
Algorithm, IEEE Trans. Inform. Theory, 41 (1995) 723-732. 



Yuriy A. Reznik 

RealNetworks, Inc. 

Email: yreznik@ieee.org 

Anatoly V. Anisimov 

Kiev National T. G. Schevchenko University 
Email: ava@mi.unicyb.kiev.ua 




Part IV 

Trees 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



New Strahler Numbers for Rooted Plane Trees 

David Auber, Jean-Philippe Domenger, Maylis Delest, 

Philippe Duchon, and Jean-Marc Fedou 



ABSTRACT: In this paper, we present an extension of Strahler numbers to 
rooted plane trees. Several asymptotic properties are proved; others are conjec- 
tured. We also describe several applications of this extension. 



1. Introduction 

The Strahler number of binary trees was introduced by the hydrogeologist Hor- 
ton [11], then refined by Strahler [19] and rediscovered in computer science by 
Ershov [6] who interpreted it as the minimum number of registers allowing the 
computation of an arithmetic expression. Numerous results have been obtained 
from which one can point out explicit expressions for their distribution [8], [12]. 
In enumerative combinatorics, the research mostly involves bijections mapping 
Strahler numbers to other known statistics on other types of objects in order 
to understand their generating series : between binary trees and Dyck paths by 
Prangon [10], between binary trees and forests of plane trees by Zeilberger [25] and 
more recently between forests of plane trees and Dyck paths by Viennot [23]. The 
common construction among these results is to redefine the Strahler number as a 
pruning number on rooted plane trees. 

A natural question arises, is there a natural definition of Strahler numbers for 
rooted plane trees that extends Ershov’s interpretation [6]? This paper proposes 
such an extension and describes some related properties. This extension is different 
from the one described and studied on general tries by Nebel [16] or Bourdon et 
al. [5]. 

Apart from biology and hydrogeology fields, Strahler numbers have been used 
in computer graphics to give synthetic images of trees and landscapes (see Viennot, 
Eyrolles, Janey and Arques [22]), in Physics by Vannimenus and Viennot [20]. 
A survey paper can be found in [21]. Thus, Strahler numbers have been proved 
efficient in many fields. It is natural to apply them to information visualisation 
systems that deal with trees and graphs. Here, we present two applications of our 
definition: navigation [2] and research of ’’quasi-similar” subtrees [4]. 

After some definitions, we prove that the computation of these Strahler num- 
bers for all vertices of a tree of size n can be performed in 0(n) time and space. 
Then we prove that the branching ratio (to be defined later) of any simply gen- 
erated family of trees with finite degree (in the sense of Meir and Moon [14, 13] , 
tends to 4; for binary trees, it was known [18, 15] that r/e = 4 for all k. We end 
with a brief description of applications to information visualization systems. 




204 



D. Auber et al. 



2. Definitions and notations 



A tree is a connected acyclic graph. A rooted tree is defined from a tree by choosing 
a vertex called the root; edges are oriented such that there exists a path from the 
root to each vertex of the tree. A vertex with no outgoing edge is called a leaf, 
other vertices are called internal vertices. A successor of a vertex is called a child 
of this vertex. The degree (or arity) of a vertex is the number of its children. In 
this paper, we consider rooted plane trees, i.e. rooted trees where a total order is 
imposed on the children of each vertex. 

Let D be a set of nonnegative integers containing 0. We consider the set of 
simply generated trees, Td, in the sense of Meir and Moor [14, 13] : each vertex 
of a tree has its arity in D. In this setting, d-ary trees correspond to D = {0, d}; 
d — 2 for binary trees. 

The Strahler number was first introduced on binary trees in some work 
about the morphological structure of rivers [11, 19]. The recursive definition asso- 
ciates an integer value to each vertex of a binary tree. These values give quantita- 
tive information about the complexity of each sub-tree of the original tree. Let s 
be a vertex of a tree: 

• If 5 is a leaf then <75(5) = 1, 

• Else s has two children si and 52 and 

f max((Ti,(Si),(T6(S2)) if ^ 0’f>(S2) 

^ i <7fc(si) + 1 otherwise 

The Strahler number of a tree is the integer value cissociated to its root. If T is a 
tree, crb{T) will denote the Strahler number of T. 



Definition 2.1. Let T be a tree whose vertices are valued by Strahler numbers. A 
branch of Strahler order k is a maximal path (sq, 5i, . . . , Sp) in T, such that for 
each i G [0..n] ,<75(5^) = k. 

We denote by ^fc(T), the total number of branches in T of Strahler order k. 



Theorem 2.2. [18, 15] Let Bk^n be the total number of branches of Strahler order 
k in the set of binary trees with size 2n + 1, then 



Tk 



lim 

n— >+00 



/ Bk,n \ 



= 4. 



This ratio is called the branching ratio of binary trees. Horton has shown [11] 
that for real rivers this ratio is between 3 and 5. A generalization of Strahler 
numbers to rooted trees is suggested by the nice interpretation by Ershov [6] who 
proved that the Strahler number of the root of a binary tree is exactly the minimal 
number of registers needed to compute an arithmetic expression given by the tree 
(output by the syntactical analysis). Thus, one can define the Strahler number a 
on general trees by: 

• if 5 is a leaf then cr(s) = 1; 

• otherwise s has k 1 children Si such that (j{si) < cr{sj), if i < j (i.e. 
children are reordered for the Strahler computation) and 

(a{si) + i). 




New Strahler numbers 



205 




7=Max(5,5+lp+2,3+3,3+4) 




Figure 1. Strahler Figure 2. Construc- 

number on one vertex. tion of trees of G{ 2 }- 



Equivalently, without requiring the children to be reordered, we have 
a{s) = max {i + j : #{^ : o-(s£) > i} > j} . 

See an example in Figure 1. As for binary trees, the Strahler number of a tree 
is the Strahler number of its root. Note that on the set of binary trees, ‘^{ 0 , 2 } ? ^ and 
Gh coincide. The definition of the branching ratio according to the generalization 
stays unchanged. We conjecture that a weaker (asymptotic) version of Theorem 2.2 
holds for any family and prove it for finite D. Note that if an internal vertex 
has a Strahler value s then it has at most s children. 



3, Complexity of Strahler number computation 

Strahler numbers are useful in information visualizatioil as we will show in Sec- 
tion 6. In this field, the effective size of the trees is large : 500000 vertices is 
average. Thus, the algorithms that compute Strahler numbers must be efficient. 
In this section we describe a linear time algorithem, and prove that the nodes of 
any tree can be sorted according to their Strahler numbers in linear time; this is 
useful for the kind of applications we describe in Section 6. 

3.1. Computing the Strahler valuation in linear time 

To compute the Strahler number of a node, one has to sort its children in ascend- 
ing or descending order of their own Strahler numbers. This sorting requirement, 
together with the a priori possibility that some nodes can have a large number of 
children, make it non obvious that a linear time algorithm exists to compute the 
Strahler valuation on a tree. 

The key to our algorithm is the following observation : if each node of the 
tree is informed of the Strahler numbers of its children in increasing order, it only 
needs two counters to compute its own Strahler number; also, if the tree has N 
nodes, its Strahler number cannot be larger than N. 

Thus, we setup an array S of size N] the k-th cell of the array will contain a 
list of vertices with Strahler number fc, before their Strahler number is transmitted 
to their father. Initially, the first list contains all the leaves of the tree. Then, until 
all lists are empty, a vertex is extracted from the first nonempty list, and its 
Strahler number is transmitted to its father; if the father now knows the Strahler 




206 



D. Auber et al. 



numbers of all its children, its Strahler number is then computed, and it is inserted 
in the appropriate list. 

Each node in the tree is examined a bounded number of times overall, so that 
the total complexity of the algorithm is &{N). 

3.2. The set of Strahler values of a tree 

In some of the applications described in Section 6, one may have to select a subtree 
of a given tree, for which the Strahler valuation has already been computed, and 
sort its nodes according to their Strahler numbers. Thus, it is interesting to know 
that this sorting of N nodes can be performed in Q{N) time. This will be a 
consequence of the following lemma, which, incidentally, is the best counterpart 
to the property of Strahler numbers on binary trees stating that a (binary) tree 
with Strahler number k must have at least 2^ — 1 vertices. 

Lemma 3.1. Let T he a tree and E(T) denote the set of Strahler values of vertices 
in T. Then 

E ^<1^1 (1) 

keE(T) 

Proof: We prove the lemma by induction on the depth of trees. If T has depth 1, 
then its root has some number k of children, and Strahler number k. Thus equality 
occurs in (1), unless fc = 1, in which case the inequality is strict. 

Now assume (1) holds for all trees with depth at least /i, and consider a tree 
T with depth h-\- 1. Assume the root of T has p children, which are the roots of 
subtrees Ti, . . . ,Tp (with decreasing Strahler values). If cr(T) = <^(Ti), then there 
is nothing to prove since E{T) = UjE(T^). Thus, we assume cr{T) = k > cr{Ti). By 
the recursive definition of the Strahler numbers, this ensures there is some integer 
i, 2 < i < p, such that cr(Ti) > k — i 1. Thus, we have 

k — 1 > cr{Ti) > . . . > cr{Ti) > /c — z T 1, 

so that at least two of the subtrees Tj ,j<i, have the same Strahler value k' > 
k — p-\-l. Also, the value 1 must certainly appear in each of the sets E{Ti), since 
all leaves have Strahler value 1. 

• if fc' > 1, then we have Yl I ^ k' p - 1 -{- Y ^ which, by 

i^i^pieE{Ti) le U E{Ti) 

l^i^p 

induction, proves that inequality in (1) is strict, 

• if fc' = 1 , then k — p^ and we have Y Z] ^ ^ 1 ^ which also 

l^i^VleE{Ti) leUiE{Ti) 

proves (1) by induction. 

From this lemma and the fact that any set of k positive integers has a sum 
at least we deduce: 

Corollary 3.2. Let T he a tree then \E(T)\ < ^2|T|. 

A bijective proof of this result can be found in [1]. The number of different 
values of Strahler numbers in a given tree T has a sub-linear upper bound, and 
these values are in [1, iV]. Thus the complexity of sorting all nodes becomes linear 
(say, by using a table similar to the one we described for the Strahler computation, 
while simultaneously maintaining a balanced tree of the values appearing; each 
value will have to be inserted only once in this tree, resulting in a total complexity 
of 0{VN log N) for creating and maintaining it), and we have 




New Strahler numbers 



207 



Theorem 3.3. The nodes of any tree of size N, can be sorted according to their 
Strahler numbers in &{N) time and space. 



4. Branching ratios for binary trees 

In this section, we prove a weaker (asymptotic) version of Theorem 2.2 [18, 15] so 
as to demonstrate the method that we use for more general families. Because D is 
fixed, we will omit it in all the formulas. We define the following sets : 

Sk = {t e7o : (r{t) = fc}, Lk = {t eJd ' c^{t) < fc}, 9k = {t ' (r{t) > k}. 

Note that (3k is identically 0 on and identically 1 on 8/^. Next, consider 
the generating functions 

^(^) = ^ x^^\Sk{x,y) =y'^ x^^^,Lk{x) = ^ Gfc(x, y) = ^ 
te^D teSk te^k t^5k 

Thus, the variable x always counts size, while y corresponds to the number of 
segments of Strahler value k. Throughout the rest of this paper, whenever we have 
a bivariate generating function F(x, y), F{x) denotes the corresponding univariate 
generating function F(x, 1). 

The method consists in the following steps. 

• Write out an equation for Gk{x^y)\ 

• consider the derivative of Gk{x,y) with respect to dGk/dy{x,l) = 

E 

t€5k 

• find an asymptotic expression for dGkdy{x,l), allowing us to derive as- 
ymptotic expressions of the form Bk^n = Ckn~^^‘^ p~^(l + o(l)); 

• compute the branching ratio as Ck/ck^i. 

In what follows, we will omit the x and y variables in the series as soon as there 
is no ambiguity. Let us now see what happens on the set of binary trees. In this 
case, D = {0, 2}. In the first step, write out the equation. 

Gk = x{2Lk-iGk + 2Sk-iGk + + 2SkGk + G|) (2) 

Each monomial in the right-hand part of (2) corresponds to one of the possible 
configurations of Strahler values for the two children of a node of Strahler value 
higher than k (see figure 2). In a second step, we replace Sk{x) by ySk{x) to 
account for the number of segments of Strahler value fc, and differentiate to get 

dGk 2x{Gk-i{x,l) - Gk{x,l)f 

= ® 

The series F{x) is an algebraic series (this is the well-known Catalan series) 
with square root type dominant singularities, and each Sk{x) and Lk{x), as a 
rational series, has only poles as singularities. This implies that the radius of 
convergence of F is strictly less than that of each Lk or Lk^ and, in turn, that 
Gk{x) = F{x) —Lk-{-i{x) has the same singularity structure as F{x). Furthermore, 
the denominator in (3) vanishes at the singularities, while the numerator takes a 
finite, nonzero value. 

As a result, the singularities of dGk/dy{x^l) are of the inverse square root 
type, so that the coefficients Bk^n have an asymptotic of the form 

Bk,n = + o(l)), 




208 



D. Auber et al. 



where Ck is proportional to 1) — G/e(l/4? 1))^- 

Step 4 consists in proving that liuik^ooCk-i/ck = 4; this is straightforward 
once one notices that (2) can be rewritten as 

Gk{x,l) _ Sk{x,l) - Gk{x,l) 

Gk-i{xA)~ l-2xF{x) ’ 

from where elementary singularity analysis entails that the ratio converges to 1/2 
as X goes to ±1/2. 



5. Branching ratio for trees of 7 o with finite D 

In this section, we assume that D is any finite set of integers that contains 0 and 
at least one integer larger than 1. The family of trees we are interested in is the 
set 7d plane rooted trees where the arity of each node lies in D. 

Due to space constraints, we only sketch the proofs; the method is very similar 
to what was used in the previous section. 

The generating function for all trees in 7^:) thus satisfies the polynomial equa- 
tion 

F{x) =xY^ F{xf = x^d{F{x)) . (4) 

deD 

It is a classical result [7, 9] that this series converges as an analytic function 
inside the complex disc \x\ < po, where pD = t/^d{x) and r is the unique positive 
real solution to the equation = x4>^(r). Furthermore, F{x) has a single^ 

dominant singularity of the square root type at and this translates into an 
asymptotic expression ^ G.p~'^n~^l^ for the number of D-trees of size n. 

For each integer A: > 1, can be partitioned into 3 sets ^k,9k,^k with 
respective generating functions S'fc(x), Gk{x) and Lk{x). Similarly to the binary 
situation, Sk and Lk are both rational power series, so that their poles must all 
have moduli strictly larger than p]j. Thus, each Gk has a square root singularity 
at pd, with the same amplitude as F. This can be interpreted as meaning that, 
for finite fc, almost all large trees have Strahler number more than k. 

5.1. Generating functions for trees of high Strahler value 

Our goal in this paragraph is to write an equation for Gk{x). If a vertex in a tree 
has d children, then, if we only want to discriminate its Strahler number so as to 
be able to decide whether it is strictly larger than some value A;, we only need to 
discriminate whether the Strahler value of each child is larger than A:, or one of 
the values between k — d + 2 and A:, or smaller than A: — d± 2; all values lower than 
k — d-\- 2 are equivalent in this regard, because even d children each with Strahler 
value k — d-\- 1 will only result in a Strahler value k for the root node. This means 
that the generating function for all trees where the root has exactly d children 
and Strahler value larger than A:, can be expressed as x times a polynomial Qd in 
the variables Gk, S'/C, ... , Sk-d-\- 2 , Lk-d+ 2 - This polynomial Qd can be obtained by 
expanding (formally) the expression {Gk ± S/c ± • • • ± Sk-d -^2 ± T/c_c/+ 2 )^ into a 



^In fact, there is a singularity at each complex value of the form , where d' is the 

greatest common divisor of elements of D; when d' > 1, the series F{x)/x is invariant upon the 
change of variable x , and D-trees always have a size of the form d'/c + 1 for some 

k. The proofs in this paper assume that d' = 1, so as to make notations easier; they can be 
extended to the general case easily. 




New Strahler numbers 



209 



sum of monomials, and, in the resulting sum, selecting only those monomials that 
lead to a Strahler number strictly over k. 

By performing the global change of variables Si — Gi-i ~Gi^ Li = F — Gi_i, 
we can also express as a polynomial in the variables G^, . . . , G/e-^+i, F; 
this will later yield somewhat simpler equations. Note that, contrary to has 

negative coefficients. Once this is done, summing over all possible degrees for the 
root of a F-tree yields the following equation for the series Gk{x) : 

Gk{x) = a; ?d(F{x), Gk{x), Gk-d+i{x)) (5) 

deD 

(with the provision that Gi{x) = F{x) whenever z < 0) 

Our main tool is a recurrence relation between the polynomials 7d themselves, 
which enables us to obtain asymptotic results on the branching ratios of any simply 
generated family of trees with a finite set of degrees allowed. We conjecture that 
similar results should hold for infinite sets of degrees (or at least a wide class of 
them), but were not able to prove them. 

To avoid confusion between the generating functions F(x), G/e(x), Sk{x) and 
Lk{x) and the variables of polynomials Qd and Vd^ we will use lowercase letters 
for the latter, and write 

2d = Qd{g, So, . . . ,Sd-2,^),'^d = ^?d(/,5'0, • • • 5^d-i) 

We have: 

Qi{gJ) = 9,Q2{g,so,£) = (5 + so)^ + 2gfi, 

or, equivalently, 

Fi{f,go) ==ffo,5’2(/,5o,9'i) =gj- 2fl'o5i + 2go-/, 

For the first values of d, we obtained Vd through a computer algebra system. These 
suggest a recurrence for polynomials: 



Lemma 5.1. For any d>2, 

rf 

‘Fd{f,go,---,gd-i)=gd-i+d ‘J’d~i{t,go,---,gd-2)dt (6) 

d 9d-l 

and, equivalently, 

eSd-2-\-^ 



Qd(p, So, • . . , Sd-2i^) — (s' + So H \- Sd 



fSd~2 

-2f + d / 

dSd-2 



Qd-i{g,so,,..,Sd-s^t)dt 

( 7 ) 



Proof: We prove (7); (6) is equivalent under the previously mentioned change of 
variables. Recall that is the enumeration polynomial for the ways a tree with 
Strahler value higher than k can be constructed with a root and d subtrees whose 
Strahler values are higher than k (counted by the variable g), or any value between 
k (counted by so) and fc — d + 2 (counted by Sd- 2 )^ or lower than A; — d + 2 (counted 
by ^). 

Now consider the set Ld of words of length d over the alphabet {g, k,k — 
l,...,fc — d + 2,^} (where g stands for “higher than fc”, and i stands for “lower 
than fc — d + 2”), that, if they are interpreted as the sequence of Strahler values 
for the d children of a node, result in this node having Strahler value higher than 
k. Qd is none other than the commutative image of Ld when each letter A: — i is 
mapped to the variable Si. What we need to do is provide a description of Ld in 
terms of that, under commutation, can be interpreted as (7). 




210 



D. Auber et al. 



First, note that all words of length d where i does not appear, belong to : 
having d children, each with Strahler value at least fc - d + 2, is enough for a vertex 

to have Strahler value at least This justifies the term {g + sq-\ h Sd- 2 )^ 

in (7), and we now turn to words in JCjd where the letter i appears. 

Consider a word w G J^d^ with j + 1 total occurrences of ^ or A: — d + 2 
{j > 0 ), and the (multi-) set of j + 1 words of length d — 1 obtained by first 
replacing each k — d 2 with an £, then removing one of the £. It is easy to see 
that each of these words belongs to Ld-i- Inversely, for any word w' G J£^d-i with 
j occurrences of £, the (multi-)set of d(2^ — 1 ) words obtained by first inserting an 
additional £ in any of d positions in w', then replacing each £ with either itself or 
k-d-\-2- but leaving at least one occurrence of since words in £jd without £ are 
already accounted for - will produce only words of each word with j + 1 total 
occurrences of k-d-\-2 or £ being obtained j + 1 times. Letting all letters commute, 
and summing over all words of fid-i, we see that each monomial MP in Q^i-i 
becomes j^M{{£ + Sd- 2 y~^^ ~ ^(^- 2 ) which is exactly symbolic integration 

with respect to £ on the interval [sd- 2 ^Sd -2 + ^]- Summing all contributions, we 
get (7). 

The following corollary is an easy consequence of the previous lemma and of 
the expressions for the first polynomials T^d and : 

Corollary 5.2. For any d>\, 

^ 0 ? • • • 5 = d. (^ + So + • • • + 5 o^_2 + ^ ( 8 ) 

og 

Furthermore, ^d homogenous of degree d in its variables, and has d—1 in 
f and degree 1 in g^). Thus, Td — go'^d + '^d, where 



gi, 


■ = 9d-i+d y^d-i(t,9i,---,9d-2)dt 

d 9d-i 


(9) 




...,gd-i) ^ d f T>d-i{t, 9 i,...,gd- 2 )dt 

dgd-1 


(10) 




y<d = ( 2 ) 5 ?/''-' +0(0 


( 11 ) 




Dd = O-c?(g!-1)5iO + 0(O 


( 12 ) 


In light of this, (5) solves to 




Gk{x) 


^12d€D '^d{F{x), Gk-lix), Gk-d+l{x)) 

1 ~^T,d€D F>diF{x),Gk-i{x), Gk-d+i{x)) 


(13) 


Notice that the leading terms (in powers of F(x)) in the denominator of (13) 
exactly cancel the term of 1 when x = po^ since 



xY^dF{xf-^ = 1 
den 

is exactly the equation for the singularities of F. Also, note that (from (11-12) 
the remaining leading terms (in powers of F(x)) in the numerator and denomi- 
nator are in a ratio that is exactly Gk-i{x). Since, for any x, all variables of the 
polynomials Tfd and T>d tend to 0 as A; tends to -hoo, this suggests that the ratio 
Gk{pD)IGk-i{pD) converges to 1/2 as A; goes to +oc. In fact, the only ingredient 
missing to complete the proof is to justify using the powers of the various Gk-i 




New Strahler numbers 



211 



variables to select the dominant terms in (13). This is the reason for the following 
lemma. 



Lemma 5.3. Let D be a finite set of allowed degrees, 0 £ D, D g!: {0, 1}, and let 
d' = maxD. Then, for any k >1, 

Gk-iipu) > Gkipn) > -^Gk-iipo)- (14) 

Proof: (sketch) The first part is an obvious consequence of the fact that Gk-i sums 
over more trees than Gk] the second part is proved by defining = ~L)d + df^~^ , 
and using (6) to prove, by induction on d, that 

l^d(/,5^i,...,5^ci-i) > ^5'd-i£d(/,^i,...,^d-i) > 0 (15) 

holds as soon as 0 < < • • • < gd-i < /• The lemma is then proved by a convex 

combination of (15), observing that 

^ Y^deP '^d{F{pD),Gk-i{pD ), . • . , Gk-d-\-i{pp)) 

^ ^dep ^d{F{pD), Gk-i{pp), . . . , Gk-d-^i{pp)) 

Corollary 5.4. For any finite D, let yk (respectively, Gk) denote the singular value 
of Gk: Ik = Gk{pp) (respectively, Gk = Sk{pp))^ Then, 



lim lim — = 1, lim 

k—^oo 'Jk—l 2 k-^oo ^k k—^oo Gk—1 



1 

2 



5.2. The branching ratio 

We now turn to the task of evaluating the branching ratio for the Strahler numbers; 
that is, we must compare the total numbers, in 2^-trees of size n, of branches of 
Strahler order k and fc — 1. Let denote the limit, as n tends to infinity, of this 
ratio Bk,n/Bk-\-i^n- We will prove that tends to 4 as fc tends to infinity - a 
somewhat weaker result than the previously known fact (for binary trees) that 
r/c = 4. To do this, we consider the bivariate generating function 

Fk{x,y) ^ 

TeTo 

Clearly, Fk{x,l) = F{x). Using standard generating function manipulations and 
notations, the total number of branches of order k in T>-trees of size n is exactly 

We have Fk{x, y) = Lk{x)+ySk{x) + Gk{x, y). Using the notations of the previous 
paragraph, 

Gk{x,y) = Qd{Gk{x,y),ySk{x), Sk-i{x), . . . , Sk-d+ 2 {x), Lk-d+ 2 {x)). (16) 

dep 

Note that (16) proves that Gk{x,y), and by extension Fk{x,y), are algebraic series 
in their two variables, since the various Si and Li series are all rational. If we 
differentiate (16) with respect to y and set y = 1, we get an equation which we 
can immediately solve for Tk{x) = dFk/dy{x, 1) : 



^{Gk{x,l), Skjx), . ■ ■ , Sk-d+2jx), Lk-d+2{x)) 

1 X ^^d^D dg (.Gk{x^ 1), jSfc(x), ■ • • , Sk — d+2{x^^ Lk—d+2{x')^ 



( 17 ) 




212 



D. Auber et al. 



The denominator in (17) is exactly 1 - = F{x)/{xF'{x)). 

Since this vanishes at the dominant singularity x = po, and the numerator in 
(17) has a finite, nonzero limit, using a transfer lemma [7, 9], we get the following 
estimates : 

• Bk,n ^ for some positive constant 6^; 

• the branching ratio for branches of Strahler number k is 

bk-\-l ^k-\-l • • • 5 ^k~d-\-3i Lk-d-\-3{PD)) 

The first factor tends to 2 by Corollary 5.4. In the second factor, the variables 
of the polynomial in the numerator and denominator are in an asymtotic factor of 2 
(also by Corollary 5.4), and tend to 0 as A; tends to infinity, except the last variable 
Lj{pD) which tends to F{po)- Since all involved polynomials are homogenous with 
a nonzero term of degree 1 in the variable the whole second factor also tends to 
2 as fc tends to infinity. 

Theorem 5.5. The branching ratio of trees of any family of finite degree D is 
asymptotically 4. 



6. Applications to information visualization 

Originally, Strahler numbers were designed to highlight river shapes. The exten- 
sion that we have given is in the same spirit: highlightening information from tree 
structures. We show here two results. One concerns navigation tools [2] and, the 
other file system retrieval [4]. When one visualizes large trees on a screen, display- 
ing the elements (vertices and edges) can take more than fifty milliseconds (Jo)- In 
order to have a high performance visualization system, one needs to accept events 
coming from the user (mouse movements, keyboard actions) within Jq- In order 
to increase performance, the method proposed by Wills [24] consists in predicting 
the number of elements that can be displayed during Jq- Thus the system incre- 
mentally displays the tree (by parts associated to Jo) and ensures that the time 
reaction of the system is less than Jo without any request to the clock. 

The problem now is to select the best elements to display so that the user 
should not be lost in his data. In our method, we propose to display the elements 
in decreasing Strahler order of the vertices. The figures show the result for 520000 
vertices: Figure 3 shows what happens when smart choice is done, and Figure 4 
shows the result when applying our method. 

An other field of interest is to find similar structures in an information sys- 
tem. Similar here means not the same but quite the same : the user may choose a 
’’degree of similarity”. One real question assigned at the Infovis’03 contest [4] was 
on file systems (that are trees) : given two views S\ and S 2 of a same file system 
taken at two different times, show to the user what parts have changed. Work 
has already been done based on arity by Zemlyachenko [26] and more recently by 
Dinitz et al. [17]. Their algorithm gives a partition of subtrees into isomorphism 
equivalence classes. It is proved to be linear. However, it only detects isomorphism 
and does not provide a measure of similarity for subtrees. In a recent work [4], we 
used a combination of three parameters (number of vertices, arity, Strahler num- 
ber) in order to detect quasi-isomorphic subtrees. The Figure 5 shows the trees 
representing the same file system at two different times (Si and 82 )- Each tree has 
about 80000 vertices. 5i and S 2 look similar. Thus roughly, with the ’’isomorphism 




New Strahler numbers 



213 



•• f 




Figure 3. Display 
without Strahler 




Figure 4. Strahler- 
guided display 







• III • 



lafi 



*» *171 »ial 

•<•••>» / *•»€ 

CS-ra-MM CS-TK.IMI 1 

CS-T«-|III 

c<-Ta-ii}i cs-ra-4t«i cs-TK-m? 

CS>T«.|I*» CS-T«.41>t Ct-TK-|>74 

C» T«.m« Ct ra.4»7) cs-t«.}|m .. 

C»-ra ii7« cs ra-4«it cs ra ii7> * 

CS-ra.|4tf cs-ra-iftt cs-rai4ii 



C*.ra.| 



Ct-ra-MK 



cs-Ta-i4«f cs-ra.iMs 

csTa.if4« cs-Taiit* 
csTa.i«4< 



e*-7a.i< 

c$-t 



cs ra 4«ja 
CS-ra-4i«i 
CS-ra-4i>s 
CSTa-4f7| 
ct-ra-4«i* 
cs-ra-i««* 
csraiti* 
cs-ra-iMt 
CSTa-iii* 
a. ITU 



Rlr lyncmSI 



RIc S2 



Figure 5. Two file 
systems: overview 



Figure 6. Two file 
systems: first zoom 



level” chosen in this view by the user, he can consider that no big changes have 
occurred during the chosen time. Applying our algorithm recursively, we detect 
subtrees that are different. One can see in the figure 6 that some smooth changes 
have occured on the labels even if the drawing of the tree is the same. This is 
due to the navigation tool that we have described above. In the real version[4], we 
use only colors because same colors suggest same subtrees. The similar big disks 
in Figure 6 are different and have 951 and 755 nodes, respectively. Figure 7 was 
obtained by suppressing the two disks after inspection. We have only displayed 
a similar part of the two subtrees. All these methods have been implemented in 
Tulip [3] and work on real examples. 



References 

[1] D. Auber. Outils de visualisation de larges structures de donnees. PhD dissertation, 
Universite Bordeaux 1, 2002. 

[2] D. Auber. Using strahler numbers for real time visual exploration of huge graphs. In 
International Conference on Computer Vision and Graphics, volume 1-3 of Journal 
of WSCG, pages 56-69, 2002. 

[3] D. Auber. Tulip: A huge graph visualisation framework. In P. Mutzel and M. Jiinger, 
editors, Graph Drawing Softwares, Mathematics and Visualization, pages 105-126. 
Springer-Verlag, 2003. 

[4] D. Auber, M. Delest, J.P. Domenger, P. Ferraro, and R. Strandh. EVAT : Environ- 
ment for visualization and analysis of trees. In IEEE Symposition on Information 




214 



D. Auber et al. 




Figure 7. A final zoom on both file systems 



Visualisation Contest^ volume www.cs.umd.edu/hcil/iv03contest/, pages 124-126, 
2003. 

[5] J. Bourdon, M. Nebel, and B. Vallee. On the stack-size of general tries. Theoretical 
Informatics and Applications, 35(4):163-185, 2000. 

[6] A.P. Ershov. On programming of arithmetic operations. Communication of the 
A.C.M, l(8):3-6, 1958. 

[7] P. Flajolet and A. Odlyzko. Singularity analysis of generating functions. SIAM J. 
Discrete Math., 3:216-240, 1990. 

[8] P. Flajolet, J.C. Raoult, and J. Vuillemin. The number of registers required for 
evaluating arithmetic expressions. Theoretical Computer Science, 9:99—125, 1979. 

[9] P. Flajolet and R. Sedgewick. The average case analysis of algorithms: Complex 
asymptotics and generating functions. Rapport de recherche 2026, INRIA, 1993. 

[10] J. Frangon. Sur le nombre de registres necessaires a revaluation d’une expression 
arithmetique. RAIRO Informatique theorique, 18:355-364, 1984. 

[11] R.E. Horton. Erosioned development of systems and their drainage basins, hy- 
drophysical approach to quantitative morphomology. Bulletin Geological Society of 
America, 56:275-370, 1945. 

[12] R. Kemp. The average number of registers needed to evaluate a binary tree optimally. 
Acta Informatica, 11:363-372, 1979. 

[13] A. Meir and J. W. Moon. Erratum: “On an asymptotic method in enumeration”. J. 
Comhin. Theory Ser. A, 52(1) :163, 1989. 

[14] A. Meir and J. W. Moon. On an asymptotic method in enumeration. J. Comhin. 
Theory Ser. A, 51(l):77-89, 1989. 

[15] W.J. Moon. An extension of Horton’s law of stream numbers. Math. Colloq. Univ. 
cape Town, 1980. 

[16] M.E. Nebel. A unified approach to the analysis of Horton- Strahler parameters of 
binary tree structures. Random Struct. Algorithms, 21(3-4):252-277, 2002. 

[17] M. Rodeh Y. Dinitz, A. Itai. On an algorithm of Zemlyachenko for subtree isomor- 
phism. Information Processing Letters, 703:141-146, 1999. 

[18] R. Shreve. Statistical law of stream numbers. J. GeoL, 74:178-186, 1966. 

[19] A.N. Strahler. Hypsomic analysis of erosional topography. Bulletin Geological Society 
of America, 63:1117-1142, 1952. 




New Strahler numbers 



215 



[20] J. Vannimenus and X.G. Viennot. Combinatorial analysis of ramified patterns. J. 
Stat Phys., 54:1529-1538, 1989. 

[21] G. Viennot. Trees everywhere. In A. Arnold, editor. Colloquium on Trees in Algebra 
and Programming^ Lecture Notes in Computer Science 431, pages 18-41. Springer- 
Verlag, 1990. 

[22] G. Viennot, G. Eyrolles, N. Janey, and D.Arques. Combinatorial analysis of ramified 
patterns and computer imagery of trees. In SIGGRAPH Conference, volume 23 of 
Computer Graphics, pages 31-40, 1989. 

[23] X.G. Viennot. A Strahler bijection between Dyck paths and planar trees. In R. Cori 
and O. Serra, editors, 11th Formal Power Series and Algebraic Combinatorics, pages 
573-584, 1999. 

[24] G.J. Wills. NicheWorks: Interactive visualization of very large graphs. In Giu- 
seppe Di Battista, editor, 5th Symp. Graph Drawing, volume 1353 of Lecture Notes 
in Gomputer Science, pages 403-414. Springer-Verlag, 1997. 

[25] D. Zeilberger. A bijection from ordered trees to binary trees that sends the pruning 
order to the Strahler number. Discrete Math., 82:89-92, 1990. 

[26] V.N. Zemlyachenko. Determining tree isomorphism. Seminar on Combinatorial 
Mathematics, pages 54-60, 1971. 

David Auber 

LaBRI, Universite Bordeaux 1 
auber@labri.fr 

Jean-Philippe Domenger 

LaBRI, Universite Bordeaux 1 
domenger@labri.fr 

Maylis Delest 

LaBRI, Universite Bordeaux 1 
maylis@labri.fr 

Philippe Duchon 

LaBRI, Universite Bordeaux 1 
duchon@labr i . fr 

Jean-Marc Fedou 

I3S, Universite de Nice-Sophia Antipolis 
fedou@unice.fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



An Average-Case Analysis of Basic P2iranieters 
of the Suffix Tree 

Julien Fayolle 



ABSTRACT: The LZ’77 algorithm offers one of the best available rates for 
lossless data compression. It is based on the suffix tree structure. Our aim is to 
obtain the asymptotics of the mean size and external path length of a suffix tree by 
comparing them to those of a trie or digital tree. The core problem lies within the 
set on which we build the suffix tree. This set is correlated, so we cannot use the 
methods that have proved efficient for the trie. The proof relies on combinatorics, 
generating functions, and complex analysis. 



1. Introduction 

The trie or digital tree data structure [7, 8, 9] manages efficiently dictionaries. 
Queries for an existing word or insertion of a new word in the dictionary can 
be performed in expected logarithmic time in the number of items stored in the 
dictionary. 

While the LZ’78 data compression algorithm is based on digital search trees, 
we will focus on suffix trees, a particular kind of trie which lies at the very heart of 
the popular and efficient LZ’77 [11] lossless compression algorithm. This algorithm 
is behind the gzip software. 

In a groundbreaking article, Jacquet and Szpankowski [6] have developped 
a sophisticated “string ruler” approach to obtain results on the asymptotics of 
parameters of suffix trees. This paper uses Jacquet and Szpankowski’s lead idea: 
asymptotically the mean size and external path length for a trie built on n words 
and for a suffix tree built on n suffixes are very close. 

This paper’s aim is threefold: first to provide simpler proofs of Jacquet and 
Szpankowski’s results than they do, second to obtain more accurate results, and 
third to lay the groundwork for a study of suffix trees under a broader model. 

1.1. Tries 

We first define recursively a trie on a set X of infinite words on the m-ary alphabet 

yi = {<2l, ‘ * 5 ^ 



f0 if|Xl = 0, 

trie(X) = if 1^1 = 1, 

[ (•, trie(X\ai), ..., trie(X\a^)) else, 

where X\a is defined as the set of words starting with the letter a whose first 
letter is removed. 

From now on the alphabet will be binary yi = {0, 1}, a choice that entails no 
loss of generality. 




218 



Julien Fayolle 



1.2. Source model 

How do we obtain the infinite words constituting the set X? By a device, called 
source^ producing randomly symbols from the binary alphabet regularly in time. 

The type of source we are dealing with has two main characteristics: it is 
randomised^ symbols are emitted with probabilities; and memoryless^ the emission 
of a symbol at a given time is independent of the symbols already emitted. In 
this work, the probability of occurrence of a symbol is independent of when it is 
emitted. This specifies the memoryless source model a.k.a. Bernoulli model. 

Definition 1.1. The probability that the source emits a sequence of symbols starting 
with the pattern w is noted p^^, and called occurrence probability. For a memoryless 
source, Pu, is the product of the probabilities of the letters composing w. 

We note p the probability of emitting the symbol 0 and q the probability 
of emitting 1. A source is said to be symmetric if p = q = 1/2 and biased oth- 
erwise. For convenience, we adopt the convention that p is the largest of the two 
probabilities. 



1.3. Parameters under the spotlights 

For each internal node of a trie, the successive left (encoded by 0) or right (encoded 
by 1) steps taken to go from the root to the node encode the prefix associated to 
this node. An internal node exists within the trie (relatively to an infinite complete 
binary tree) if there are at least two words in X starting by the prefix associated 
to this node. 

For a pattern w, Nyj{X) is introduced as the number of words of X starting 
by w. Sometimes, we note for A"^(X). 

Let the parameters S and P denote the size and the external path length. 
Both can be rewritten in terms of for a trie built on the set X by 

S{X) ~ ^ 2], 

w£A* 

P{X) := Y, > 2J, 

where |P| = 1 if P is true, 0 else. This is Iverson’s bracket notation. 

1.4. Suffix trees 

Let y be an infinite word on the alphabet A and Yn the set of the first n suffixes 
of y (we consider y to be its own suffix). The suffix tree of index n based on y is 
nothing but the trie built on (this operation is valid since is a set of infinite 
words over the alphabet A). 

Since tries and suffix trees are based on the same recursive decomposition, 
the expression for the parameters S and P under consideration are identical. For 
a trie on a set X, means the number of words starting with the pattern w, so 
for a trie on the set Yn (suffix tree on y) it coincides with the number of suffixes 
(amongst the n first of them) of the word y for which w is a, prefix, or equivalently, 
the number of occurrences of the pattern w in the n first positions of the word y. 
We introduce Nw{y'i ^) ^ fhe number of occurrences of the pattern w in the first 
n positions of y, and we express the size S and the external path length P of a 




Parameters of the sufRx tree 



219 



suffix tree as 

S{y,n) := ^ lNw{y,n) > 2], (1) 

weA* 

P{y,n):= ^ Nw(y, n)lNw{y, n) > 2j (2) 

wEA* 



1.5. Plan 

We recall the results obtained by Knuth [7] on the mean size and external path 
length for a trie built on n strings: 

77 

EUS) = -{l + e'{n)) + 0{\ogn), (3) 

En(-P) = + {K + e{n))n + 0{\ogn), (4) 

where e and e' are oscillating function of very small amplitude around 0. These 
results were proved in a less intricate way in [1]. 

The purpose of the next section is to obtain, via Guibas and Odlyzko’s work 
and complex analysis, an asymptotic expression of the mean size, En(*S), and 
mean external path length, E^(P), for suffix trees built on n suffixes. There are 
two probabilistic models on tries: one can build them on a set of size n (fixed-size 
model a.k.a. Bernoulli) or on a set which size follows a Poisson law of parameter 
2 : (Poisson model). The difference between mean size for tries under Poisson of 
parameter n and Bernoulli of parameter n models is fairly small (of order logn), 
this is also true for mean external path length. Section 3 is dedicated to studying 
the difference A between mean path length for tries under the Poisson model of 
parameter ri, E^^^^(P), and for a suffix tree built on n suffixes, En(P). Our aim 
is to show that A is small. Section 3 focuses on the external path length but the 
same techniques can be used for size. 



2. Asymptotics 

Since we know from the previous part the expression for the size S and the external 
path length P of a suffix tree, we write down the mean over suffix trees built with 
n suffixes for both parameters: 

E„(5) = ^ 21) = P„(7V^ > 2) 

wEdVl* wEM.* /-X 

^ ^ ^ \^/ 

= 22 l-En{N^ = 0)-Fn{N^ = l), 

E„(P) = > 2]) = J2 = !)• (6) 

This part is dedicated to finding the asymptotic value for the two probabilities 
appearing in the formulae. 




220 



Julien Fayolle 



2.1. Combinatorics 

The expressions for the mean size and external path length of a suffix tree make 
use of the probabilities = 1) and Pn(Mi; = 0), that is the probabilities to 

obtain one (resp. zero) occurrence of the pattern w in the first n + |u;| — 1 letters 
of an infinite word {i.e., an occurrence of w starting at one of the first n letters of 
the infinite word). 

In order to obtain these probabilities, we introduce ordinary generating func- 
tions counting the number of texts with a given number of occurrences of the 
pattern w according to their size: let 0^{z) = J]n>o be the ordinary gener- 
ating function (ogf) counting texts with w occurring only once according to their 
size and N^{z) the ogf counting texts with no occurrence of the pattern w. 

The possible overlap of the pattern w with itself causes problem in the enu- 
meration of occurrences of a pattern in a text. This phenomenon is called autocor- 
relation. The pattern w of size k ha^ an overlap of size j if l< 3 <k and the prefix 
of size j, Pj, and the suffix of size j, of w coincide (Pj = Sj). Graphically, an 
overlap of a pattern (white rectangles) looks like this: 




where the two black boxes are the prefix and suffix matching one above the other. 
For example ic=001001001 has overlaps of size 3 and 6; \w\ is always a valid 
overlap. 

The autocorrelation of a pattern is encoded by the autocorrelation polyno- 
mials: the combinatorial one is 



k-i 

Cw(z) = Cjz’’ 

i^O 

and, since our symbols are produced by a probabilistic source, the probabilistic 
version that we need is 



k-i 

i=0 

where q = {Sk-i = P/c-zl, meaning there is an overlap of size k — i. 

For the memoryless sources we are dealing with, the probabilistic autocorre- 
lation polynomial satisfies a neat and useful relation: 

Lemma 2.1. 

^ ^ Cw{^) = 2^ + fc — 1. 

Proof: On a binary alphabet there are 2 ^ patterns of size j for any j < k and 
from each of these v, one can build a unique w of size k such that — 1 ^iid v 
is its suffix of size j. So there are at most 2 ^ patterns of size k with a given suffix 
of size j and satisfying Cj = 1. Furthermore, there is no way two different suffixes 
of size j can create the same word of size fc, so there are exactly 2 ^ patterns of size 




Parameters of the suffix tree 



221 



k satisfying Cj = 1 and this for every j between 1 and k — 1. Hence, 

k — l 

^ ^ ~ ^ ^ ^ ' ‘ * '^k) 

j=0 

k-1 

= X] P(Wfc_i+l---Wfe), 

j=0 Cj^w = ^ 

but since we just proved that the suffix of size j is enough to obtain a word w of 
size k with Cj = 1 (and = 1 for all w e M^), 



k-1 



Y. = 2'' + XI X 

j=l v^'M.3 
k-1 

= 2 ^ + "^1 = 2 ^ + k -1 I 






In [4, 5], Guibas and Odlyzko devised a combinatorial method based on formal 
languages in order to obtain the generating functions counting the number of texts 
with a fixed number of occurrences of a given pattern w according to their size. It 
leads to the generating function Ow{z) and Nyj{z) 

zk 

^ {c{z){l — 2z) z^)^' 

jSf (z) — 

c{z){l-2z) + z^^ 

where k is the length of w. 

When no ambiguity arises, we abbreviate Ouj{z) and N^{z) by 0{z) and 
N{z). Basing ourself on Guibas and Odlyzko’s method, we obtain the probabilistic 
versions 0 ( 2 :) and 3\f(2) of 0(z) and N{z), these generating functions count texts 
with their probability, so 



O(^) :=^P„(iV^ = l)z" = 

n>0 



PvjZ’' 

{c{z){l - Z) 



( 7 ) 



and 



:=Y^n{Nu,=0)z^ = 



c{z) 



n>0 



c{z){l - z) 



( 8 ) 



The next step will be to extract the coefficient of order n in these probabilised 
generating functions. 



2.2. Complex analysis 

In order to isolate the dominant pole p of the generating functions Oy^ and 
we use Rouche’s theorem. The adequate contour is a circle C centered at the 
origin with a radius R depending on the position of the first non- trivial 1 in i(;’s 
autocorrelation polynomial. 

For example, if ci = 1 we choose R = 0.5(1 + 1/p), then the disc of radius R 
contains p as unique pole of the generating function. No general formula has yet 
been devised for the radius, but we are assured of its existence by Pringsheim’s 
theorem. 




222 



Julien Fayolle 



The contour is also the one we use for an application of Cauchy’s theorem. 
There is only one pole to the generating function 0(z) (or inside the disc so 

that 



1 

2m 




dz = Res 




+ Res 




( 9 ) 



where Res(/, a) means the residue of the function / at a. 

The modulus of 0 ( 2 ;) can be bounded on the circle C. Furthermore, the residue 
in 0 of 0 ( 2 :) is the coefficient of order n of 0{z). By developping the generating 
function in Laurent series near the pole p, we obtain the residue at p: 



Res 




- (n + l))F'{p) - pF"{p% 



where F{z) = c{z){l - z) -\-PwZ^- 



2.3. Approximation 

This subsection is dedicated to finding a good approximation for the value of the 
dominant pole p. We recall that p is defined as the solution of smallest modulus of 

F{z) = c{z){l - z) FpujZ^ = 0. (10) 

Furthermore, due to Pringsheim’s theorem [3], we know that p is positive real. 

Since F(l) = Pw is close to zero on all but a few patterns (the probability 
decreases exponentially with the size of the word w), and we are dealing with 
polynomials, p is greater than 1 but close enough to it. 

Let’s introduce a\ such that p — 1 ai. We know that ai is positive, and 
satisfies 

c(l + ai)ai +Pt^(l + ai)^ = 0. (11) 

It is hard to solve this equation to get Oi, so we introduce a such that 

c(l -\ra)aFPw = 0- 

a is close to ai since only small terms have been omitted from (11). 

Using Rolle’s theorem, we obtain 

FIB) 

c{lFa) = c{l) oF {(3) =c{l) 

c(l + a) 

for /3 €]1, 1 + a[. Since the quantity c'{/3)/c{l -h a) is bounded by a constant, and 
Pu; is very small compared to c(l), one has 

c(l + o) c(l). 

Finally, in the residue from the previous section, the term becomes 
= (1 + ai)-" ^ (1 +a)“” ~ exp(-na) - exp • 



3. Splitting in four 

From now on, we only deal with the external path length parameter since the 
methods used for this parameter apply similarly to the size. 

Let A be the difference between the mean of the external path length P for 
a trie under Poisson model of parameter n and for a suffix tree on n suffixes: 




Parameters of the suffix tree 



223 



Collecting results from preceding sections, we obtain 

A = ^ P„(iV^ = 1) - = 1), 

where = 1) is the probability of the event = 1 taken over all tries 

built on set of size 2 ;, where 2 ; follows the Poisson law CP(n) of parameter n. The 
previous section has provided the asymptotic behavior of = 1) and known 

results from poissonisation [2, 9] allow us to write informally 

A ~ npn, ^exp - exp(-np^)^ . (12) 

Remark: Since c(l) > 1, A is a positive quantity, hence asymptotically and 
on average the path length is longer for a trie than for a suffix tree. 

The remainder of this section consists of a delicate subdivision of the set of 
all patterns in order to control the asymptotic growth that each of these subsets 
brings to the sum (12). In [6], it was shown that the sum A was of order 
for some unspecified e >0, and we want to obtain an explicit bound. 

The function x xexp(— x) dominates the behavior of the sum A. A perusal 

of this function’s graph induces a three-part splitting of the set of all patterns on 
whether np^ tends to infinity, to zero or remains “almost constant” ; the latter will 
also be cut into two according to how c^{l) is close to 1. 

3.1. Small sizes 

First, we focus on patterns of small sizes, which are in relatively small number. 
Bounding crudely their contribution to the sum A by the product of their number 
by the worse they grow will be sufficient to prove they do not contribute much in 
A’s growth. 

We define the small-sized words as those complying with 

5 5 

\w\ = k< - logi/, n = -Cq logn =: ks{n), 
for Cg := (logl/q)~^. 

Intuitively, a pattern of small size is one that satisfies np^ju — > oo; or in a more 
quantitative approach 

npw > (13) 

I call a slice of patterns the set of all patterns of a given length. It is more com- 
fortable to handle slices, so the definition for small-sized patterns means those in 
slices satisfying (13). 

We have a binary alphabet, thus the number of patterns of size smaller than 
ks{n) is of order and for any small-sized pattern 

npw exp(-n^/®/c(l)) 

<ni/®exp(-(l-p)n^/®). 

Finally, the patterns of small size contribute to A less than 

„5C,/6^1/6exp(_(l_p)„l/6)^ 

and due to the dominance of the exponential decrease, to o(l) (this suffices for our 
goals). 




224 



Julien Fayolle 



3.2. Large sizes 

This part deals with patterns of large size, defined as those whose respective slices 
satisfies the property np^ < The intuition is to catch patterns with npuj 0, 

but we refine this condition quantitatively into npyj < before resorting to slices. 

These patterns indeed are of large sizes: for a symmetric source, for example, 
npyj — n2~^ < lly/n^ implies that fc, the length of w, satisfies k > 1.5 log n. This 
definition translates on the length of the patterns into 

k > 1.51ogi/pn = l.BCplogn =: ki{n). 

With this definition, all large patterns obey np^ 0; so a Taylor expansion of 
the function x xexp{—x) near zero yields 

k>ki{n)weM^ \ \ V ;/ / k>ki(n)weM^ ^ ^ ^ 



Obviously there are an infinite number of large patterns, which prevents us 
from using a brute-force majoration like for the small-sized patterns. However, one 
has 






2 = 0 



= {P^ + 9 ^)'' =: A';, 

for a constant Ap smaller than 1 and depending only on p. 

Furthermore we have already seen that 1 <c(l) < 1/(1- p), hence 



E E npy, \ exp 

k'>ki{n) 



(-^) - exp(-np^) j f X] 



k'>ki (n) 



(14) 



(15) 



The largest Xlfc>/ci(n) grow is in the case of a symmetric source. 

This case brings asymptotically an 0{^/n) contribution to the sum A; but for 
other letter occurrence probabilities, we can improve up to a 0(l/n) growth. 



3.3. Periodic patterns 

We introduce Bk := {w : \w\ = fc, c(l) > 1 + 2~^/^} as the set of periodic 
patterns of size k. This part aims at patterns of intermediate size (neither small 
nor large) with the additional constraint they are periodic. We will abusively refer 
to these patterns as periodic. 

A periodic pattern has the first non-trivial 1 in its autocorrelation polynomial 
for a small index j, and therefore w is formed of repetitions of its suffix of length 
j. For these patterns, the second term in c(l) is the probability of the suffix of size 
j of w. But since j is small, the probability is large and c(l) is relatively far from 

1 . 

There are relatively few periodic patterns in Bk’. 



Lemma 3.1. 



#Bk < 



(16) 




Parameters of the suffix tree 



225 



Proof: We start by partitioning the patterns of size k into two: 

^ c(l)= ^c(l)+ ^ c(l). 

weBk w^Bk 

For the patterns in c(l) > 1+2“^/^ and for the others, c(l) > 1. Using Lemma 
1, we get 

Y, c(l) - 2^= + fc - 1 > #5fc.(l + 2-^/^) + 1.(2'' - Bk) = + 2'= ■ 

Prom there we bound the contribution of intermediate and periodic patterns 



npw 

'c(l) 



ki{n) , 

^v~ Y, ( exp 

k=ks{n) weBk 

ki{fi) , , 

< #^fc max < npyj exp ( 

^ w£Bk I V c(l) 

k=ks{n) ^ \ V / 

ki{n) 

<K Y 

k=ks(n) 



exp{-np^) 



where K is any upper bound on the function x x exp{—x /c{l)) on positive reals. 

We finally obtain a contribution of order logn) for the periodic 

patterns. Since we are looking for a sublinear contribution to A, this necessitates 
0.75Cp < 1, hence p < - .5946035575. 

The limiting value for p depends on the arbitrary (but smaller than 

1) factor defining the bound ki{n) (here for example this factor is 1.5). We could 
extend the boundary value of p to 1/ y/2 at the expense of a worse error term. 



3.4. Aperiodic patterns 

The aperiodic patterns are those remaining, they are of intermediate sizes (between 
ki{n) and fcs(n)) and not belonging to the set Bk> For these, c(l) is very close to 
1 hence the difference between exp(-np-u;) and np^exp{-np^/c{l)) is small. 
Since w ^ Bk, one has 



so that 



1 ^ 1 
^ - 1 + 2-^/2 



> 1 _ 2-'=/2, 



npw 





exp(-npu;)^ < np^e "p*" 



We are going to use a Taylor expansion of the exponential function near zero, 
but in order for the expansion to apply we need np^2~^/^ 0 for all aperiodic 

patterns; this leads to the condition p < po — 0.5469205467, where po is the unique 
real solution to 



P 

V2 



5/6 

+ p - 1 = 0. 




226 



Julien Fayolle 



So, for p < 0.54, we can use a Taylor expansion and since Bk <2^, we derive 

ki{n) , , . 

XI X “ exp(-r^p^„) 

k=ks(n)w^Bk ^ ^ V // 

ki{n) 

k=ks (n) 
ki{n) 

< X 

k=ks (n) 

where j3 — 4e“^ is the maximum value oi x exp(— x) over the positive reals. 

Hence the contribution of the aperiodic patterns to A is Similarly 

to the periodic case, we could increase the upper bound on p up to 2 — \/2 ~ 0.5857 
at the expense of a less precise error term. 

4. Conclusion 

Each subset of patterns contributes less than to the difference A. Hence 

the asymptotic for the external path length (resp. size) of a trie and of a suffix 
tree only differ by a small quantity. Therefore we have obtained: 

Theorem 4.1. For a suffix tree built on the first n suffixes of a string produced by 
a memoryless (p,q)-source, and for p < 0.54, the mean of the external path length 
satisfies asymptotically 

"^^ + iK + ein))n + 0{n^-^^), (17) 

and the size 

H 

+ + (18) 

where e and e' are oscillating functions of very small modulus centered in 0. 

Future research related to this work includes: providing a larger range for the 
probability p (if possible the whole [0,1] interval); applying this method to other 
parameters of the suffix tree like the fill-up level, the profile or the height; finally, 
extending the source model to the powerful dynamical framework introduced by 
Vallee [10], as it has been done for tries in [2]. 

References 

[1] Clement, J. Arbres Digitaux et Sources Dynamiques. These de doctorat, Universite 
de Caen, Sept. 2000. 

[2] Clement, J., Flajolet, P., and Vallee, B. Dynamical sources in information 
theory: A general analysis of trie structures. Algorithmica 29, 1/2 (2001), 307-369. 

[3] Flajolet, P., and Sedgewick, R. The average case analysis of algorithms: Com- 
plex asymptotics and generating functions. Research Report 2026, Institut National 
de Recherche en Informatique et en Automatique, 1993. 100 pages. 

[4] Guibas, L. j., and Odlyzko, A. M. Periods in strings. Journal of Combinatorial 
Theory, Series A 30 (1981), 19-42. 




Parameters of the suffix tree 



227 



[5] Guibas, L. J., and Odlyzko, A. M. String overlaps, pattern matching, and non- 
transitive games. Journal of Combinatorial Theory. Series A 30, 2 (1981), 183-208. 

[6] Jacquet, P., and Szpankowski, W. Autocorrelation on words and its applications: 
analysis of suffix trees by string-ruler approach. Journal of Combinatorial Theory. 
Series A 66, 2 (1994), 237-269. 

[7l Knuth, D. E. The Art of Computer Proqramminq, vol. 3: Sorting and Searching. 
Addison- Wesley, 1973. 

[8] Mahmoud, H. M. Evolution of Random Search Trees. John Wiley, New York, 1992. 

[9] Szpankowski, W. Average-Case Analysis of Algorithms on Sequences. John Wiley, 
New York, 2001. 

[10] Vallee, B. Dynamical sources in information theory: Fundamental intervals and 
word prefixes. Algorithmica 29, 1/2 (2001), 262-306. 

[11] Ziv, J., AND Lempel, a. a universal algorithm for sequential data compression. 
IEEE Transactions on Information Theory, IT-23 (1977), 337-343. 

Julien Fayolle 

INRIA, Projet ALGO 
78153 Le Chesnay Cedex 
Prance 

julien . f ayolle@inria . f r 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Arms and Feet Nodes Level Polynomial in 
Binary Search Trees 

Eric Fekete 



ABSTRACT: We define two types of external nodes in binary search trees, 
the arms and the feet, and we study the vector whose coordinates are the two 
associated level polynomials. It appears that a so-called growth matrix A describes 
the evolution of the process. A spectrum analysis of A both provides the average 
and an almost sure asymptotics of this vector. 



1. Introduction 

In the whole paper we are concerned with binary trees whose nodes are labelled 
by the elements of 

n>l 

the set of finite words on the alphabet {0, 1} (with ^ as empty word). A complete 
binary tree T is a finite subset of U such that 

f 

< if uv G 7 then u £ 7, 

[ ul e7<^u0 e7. 

The elements of 7 are called nodes, and ^ root', \u\, number of letters in u, is the 
depth of u (with \^\ =0). We write BinTree for the set of complete binary trees. 

A tree 7 € BinTree can be described by giving the set 87 of its leaves (also 
called external nodes), which are the nodes with no descendants. The nodes of 
7\87 are called internal nodes. 

Here, we are concerned with binary search trees. A quick definition is the 
following: a binary search tree (BST) process is a sequence (T^, n > 0) of complete 
binary trees, where has n internal nodes, which grows by successive insertions of 
keys, under the so-called random permutation model. Let us describe the dynamics 
of the sequence of trees. The tree To is reduced to a leaf. The tree T^+i is obtained 
from T^ replacing one of its n + 1 leaves by an internal node and thus creating 
two new leaves. The insertion is done uniformly on the leaves, which means with 
probability l/(n + 1) (for a detailed description, see Mahmoud [7]). 

The internal nodes contain data and the leaves represent free memory. All the 
following quantities are random variables on the space Bintree, endowed with the 
probability given by the random permutation model. 

To study the shape of these trees, it is usual to define the profile of the tree T^ by 
the collection 



^n,fc ;= #{w G ^n, lw| = fc} , fc>l, 




230 



Eric Fekete 



counting the number of leaves of at each level. The profile is encoded by the 
so-called level polynomial defined for z G C by 

n 

Wn{z) :=J2^n,kZ^. 
k=0 



Because of the dynamics of the tree process, this polynomial, renormalized by its 
expectation, is a T^i-niartingale, where is the cr-field generated by all the events 
{u G T^j}j<n,ueU’ More precisely, we call BST martingale 



C„(z) C„(z) n.J.) 2^ 



where Co{z) = 1 and 



n— 1 



C'„(z) nWn{z)) - n ^ = (-1)' 

fc =0 ^ 



Cn{z) 



-2z 



u^d7 ri 



n > 1 . 



It is proved (in Chauvin & al. [1, 2] and Jabbour [6]) that in the supercritical range 
2 : G (z~, 2 +), this martingale converges in to a nondegenerate limit Mo© ( 2 ^) and 
converges a.s. to 0 elsewhere, in particular for the critical values 2 “ and . These 
values are the two positive solutions of equation 2zlogz — 2z + l = 0 (z" = 0.186... 
and = 2.155...). The BST martingale and the level polynomial are the key tools 
to study the profile of binary search trees. 



The motivation of this paper is to refine the study of the profile of binary 
search trees. Following Dekking & al. [4] we define two kinds of external nodes of 
a binary search tree as the ones whose brother is an internal node (the arms) and 
the ones whose brother is an external node too (the feet). We can already notice 
that the feet go in pairs and the arms do not. Figure 1 shows a binary search 
tree whose arms are represented by + and feet by x. Notice that in computer 
science a binary search tree is a set of pointers. Each internal node contains 0, 1 
or 2 pointers, pointing at each of his descendant only if the corresponding subtree 
contains at least 2 internal nodes. Thus, every internal node (except the root of 
the tree) whose descendants are not two feet, receives a pointer from his father. 
Thus the cost of the tree, which is the total number of pointers, depends on the 
number of feet. 

For all k > 0 and n > 0 we first define two random variables and X^ which 




tree 



Figure 1.1. Arms and feet of a binary search 



respectively count the number of arms and feet of depth fc in a binary search tree 
of size n. Of course if A: > n we have = X^ = 0. One can also notice that 




Level Polynomial in BST 



231 



^nk + ^nk ~ ^n,k’ We now define two random polynomials, viewed as complex 
functions, which will help us to study the profile of the binary search tree. We call 
them respectively arms and feet polynomials: 

k=0 k=0 

Notice that W^{z) + (z) = Wn{z). 




with 



Figure 1.2. A binary search tree of size 12 
W^{z) = z^^ 2z^ + 2z^ and W^{z) = + 4z\ 



We lastly define the vector 




whose coordinates are the arms and the feet polynomials defined above and we 
are interested in the asymptotics of Wn( 2 :). Remark that Wn{z) is a powerful 
object since it permits to count the number of arms and feet nodes for each level. 
Moreover, taking z = 1, the coordinates of Wn(^) count the total number of arms 
and feet nodes in the tree. 

As in Dekking & al. [4] or Jabbour [6] we use a so-called dynamical or forward 
method, which means that we take advantage of the evolution of the process be- 
tween n and n + 1 (see section 2). This method is complementary to the so-called 
“divide and conquer” method which takes advantage of similarity and indepen- 
dence between the subtrees of the root. In such a forward method, martingale 
properties help to find strong limiting results even if limiting distributions are not 
explicit (it is the case here). For simplicity and for possible generalizations we 
choose to work in a vectorial frame (in dimension 2 here) instead of working with 
generating functions as in Dekking & al. [4]. Notice that Dekking & al. in [5] work 
on more general motives than arms and feet, but using non polynomial vectors 
and counting the total number of motives instead of counting level by level. It 




232 



Eric Fekete 



appears in sections 2 and 3 that the spectrum of the matrix driving the evolution 
of the process plays a crucial role. It is the reason why the study runs along the 
same lines as in [3] for m-ary search trees. In section 4 we finally prove our main 
result on Wn{z): 

Theorem 1.1. For every z G ^c(0, 1) D {z : Jl{z) > |}; 

— ^oo(^)u( 2 :) 0 in and a.s. 

\Z) 

where u{z) = 2z^’ ^oo(^) the limit of the BST martingale Mn{z), 

IR(z) is the real part of z and Bc{zo^r) the closed ball with center zq and radius r 
(Bc{zo^r) is the open corresponding ball). 

Since and Xf , are the coefficients of the two polynomials and 

W^{z), these results should help to find the actual profile of arms and feet in 
binary search trees. 

Taking 2 : = 1 in the theorem, the coordinates of Wn(l) respectively count 
the total number of arms and feet in We have the following strong law of large 
numbers: 

Corollary 1.2. 

W^{1) — -(n + 1) 0 almost surely 

O 

and 

(1) — -(n + 1) > 0 almost surely. 

3 



2. Evolution of the process 

To describe the evolution of the process (T^)neN we denote by Kn the n^^ key 
inserted, Dn the insertion depth of Kn, is inserted in an arm} and 

B^n Bn'^{Kn is inserted in a foot}- 

We recall that in a given binary search tree with n internal nodes all the leaves 
have equal probability to receive Then 

= = ^ ( 1 ) 

= = ( 2 ) 

and we have two cases for the insertion of Kn^i'^ 

If Kn+i is inserted in an arm node at level k we loose an arm at this level and we 
win two feet at level A: + 1. If Xn+i is inserted in a foot node at level k we loose 
two feet and we win an arm at this level and we win two feet at level A: + 1. Thus 
we can describe the evolution of the process as a random walk: 

W„+i(z)=W„(^) + A„+i(2) 



( 3 ) 




Level Polynomial in BST 



233 



where 



Calling 



we get 









- 2 ;'" 

2z'^+^ 



-2z^ + 



if D^., = k 



if D^^, = k. 



( 2I ) and^2(^)= ( 2 z -2 



^n+l 



(^) — ,=fc> + Z^(52 (z)1^jiF^ 

k=0 



Taking the conditional expectation in (3) and recalling the evolution rules (1) and 
(2) one gets 



E^"(W„+i(^)) = W„(z) + J^z'^Si(z) 



X, 



+z%(z) 



k=0 



n + 1 



^n,k 

n + 1 



= w,(z) + i.W^+«z)^ 

= W„(z) + ^W„(z), 
n + 1 

where A(z) is the endomorphism on C x C defined for any Y e by 

A{z)Y = Si{z)yi + 52{z)y2 , 

where (yi, y 2 ) are the coordinates of Y in the canonical basis of C^. We denote by 
the same A{z) the endomorphism and its matrix, A(z) = ^ 2z — 2 
We finally obtain 



E^"(W„+i(2))= W + 



n+lj 



Wn(z) 



and taking the expectation: 



where 



E(W„(z)) = r„.Wo( 2 ), 

A(z). 



( 4 ) 



( 5 ) 



rn = l[{id+ 



k=l 



k 



1 



Notice that Wq( 2 :) is nonrandom, one easily gets Wq{z) = q 

At this point it is clear that the properties of the spectrum of A{z) drive the 
asymptotic behaviour of the expectation E(Wn(>^)) and the expansion of Wn( 2 :). 




234 



Eric Fekete 



3. Average case analysis 

The eigenvalues of A{z) are Ai( 2 ;) = 2z — l and \ 2 {z) = —2. We assume from now 
on 2 : 7 ^ so that Xi{z) / X 2 {z) and A{z) is diagonalizable. For each eigenvalue 
\{z) we denote by t^x(z) projection of on the eigenspace ker{A{z) — \{z)Id) 
relatively to the decomposition 

= ker{A{z) — Xi{z)Id) 0 ker{A{z) - X 2 {z)Id). 

The two following relations are consequences of the definition of and 
for any X G we have 

A(z)X = Xi(z)7Tx,(z)X -\- X2 (z)7Tx^(z)^ ( 6 ) 

^ = 7Tx,(z)X -\-7Tx,(z)X . (7) 

Using (6) we have Tn = C^{Xi{z)).T^Xr{z) + C^{'X 2 {z)).ttx^{z), where 

cyA)=n(i + i). 

k=riQ ^ ^ 

Remark that Cf (Ai(z)) = Cn{z) defined in section 1. We use these two notations 
cLS well. 

Let us see that the expectation of W^( 2 ^) only depends on the projection t^Xi{z)'' 
indeed (— 2) is zero. For the same reason we study {^n{^))nen for 2 ; G C \ 
{ — s G N} (nevertheless one can easily get W^(0)). 

From (7) we get the useful relation 

Wn{z) = 7Tx,^z)^n{z) + W^z). (8) 

This relation leads us to study the two projections of Wn(z). The above remark 
on Ci(X 2 (z)) indicates this decomposition could coincide with the asymptotic 
expansion of Wn( 2 ;). The computation of the expectation confirms this impression: 
using (8), (5), the linearity of the projection on an eigenspace and the fact that 
for every eigenvalue A( 2 :) of A(z), the projection 7Tx(z) commutes with A(z), one 
gets Vn > 2 

E(W„(^)) = 7rAi(z)r„Wo(2:) + 7TA2(^)r„Wo(2:) 

= C’„(z)tt;,^(^)Wo(z). 



The computation of 7 Tai(;2)Wo(^) is easy, according to (6) and (7) we obtain 



u(2) := 7 TAi(^)Wo(2 



1 



1 

2i+ 1 V 22 



Notice that u(z) is a nonrandom vector. 

Moreover, we have (see for example Titchmarsh [9] p56-58) 



cr(A) 



r(Ti + A + 1) 

r(A + i)r(n + i) 



r(A + i) 



+ 0(n^-^) 



as n tends to infinity. 

We have proved the following proposition. 



(9) 




Level Polynomial in BST 



235 



Proposition 3.1. For z e C \ s e N} we have 

E(W„(z)) = Cn{z)u{z). (10) 

Asymptotically, when n ^ oo 

E(W„(.)) = «-x i)(l + 0(i)). (11) 

In particular, if z = 1, then the coordinates of Wn(^) count respectively the 
number of arms and feet in T^. We recover Mahmoud’s results [8]: 

E(W„(l)) = (n + l)( 

In other words, we have on average one arm for two feet. 

Notice that the average case is totally described by the first projection of Wn{z). 
In order to get the almost sure eisymptotics of Wn{z), let us now see how the 
projections behave. 

4. Almost sure analysis 

An elementary computation with A{z) gives the first projection lemma below. It 
explicitly computes the first projection in terms of the global leaves polynomial 
Wn(z). 

Lemma 4.1. (first projection lemma) 

Let Y be a vector o/C^ and (yi, 1 / 2 ) 'Its coordinates in the canonical basis. We have 

7Tai(z)Y = {yi + y2)n{z). 

Using this result on Wn( 2 :) we obtain 

7I’Ai(2)W„(z) = Wn{z)u{z), 

recalling Wn{z) = is the leaves polynomial of the binary 

search tree. Thus, from (4) and picking up the results of J. Jabbour [6] and Chauvin 
& al. [2] we get 

Proposition 4.2. LetzeC\ {-§ , s G N} and M„(z) := 

■V := Ui<g< 2 Vg := Ui<,< 2 {^ : f(z,q) > 0} 

where 

f{z,q) = 1 + q{2%{z) - 1) - 2\z\^. 

The process (M^(z),Tn)n 'Is a martingale and as n ^ oo, Mn(z) converges, a.s. 
and in , uniformly on every compact subset C of V to the random variable 
Mo^(z)u{z). 

Figure 4.1 helps to visualize the domain V. For q = 2 domain Vq is the ball 
Bc(l, ^). The more q is near 1 the less the domain is thick. 



According to this proposition, the almost sure order of magnitude of the first 
projection of Wn(^) is clear. We now want to prove that the second projection is 
negligible comparing to the first one. This leads us to a computation of moments. 




236 



Eric Fekete 




Figure 4.1. 

Domains Vq for q G {1.05, 1.2, 1.5, 2}. 



Lemma 4.3. (Lemma of moments) 

Let (Ti(z) denote the real part of the eigenvalue \i{z) of A{z), i G {1,2} and < .|. > 
any positive hermitian form on C^. Let Z be a vector m Vp G N we have 
(i) for every z G Bc{0, 1) 

/ E(| < Z|7T;,,(,)W„(z) > |2p) = 0(nP) 

\ E(|<ZK(.)W„(z)>pP+i) = 0(nP+i) 

(a) if z E Bc{0, 1) and 'R{z) > | then 

r E(| < Z|7Ta,(,)W„(2) > |2P) = 

1 E(| < Z|7T;,,(,)W„(^) > |2P+1) = 0(n2p-i(^)+l) 

if z £ Bc(0, 1) and !Ji(z) = | then 

/ E(1 < Z|7T,,(,)W„(z) > pP) = 0(n2p-iWlog(n)) 

1 E(| < Z|7Ta,(.)W„( 2) > |2p+i) = 0(n2p-i(-)+Mog(n)) 

ifzG Bc(0, 1) and Ol(z) < | then 

j E(| < Z|7Ta,(.)W„(z) > |2P) = 0(nP) 

\ E(1 < Z|7Ta,(.)W„(z) > |2p+i) = 0(nP+i) 

Proof 

We prove the results by induction on p. 

If p = 0 we only have to compute > !)• Using (3) we almost 

surely have 

||W„+i(^)|| < ||W„(^)|| + ma^ \\z’^Si{z)\\ 

( 2 ,fc)G{l, 2 }x{ 0 ..n} 

thus assuming | 2 ;| < 1 

||Wn+i(^)li < ||W„(^)|| + max(||^i(2)||,||52(^)||)- 
Therefore, there is some positive constant c{z) such that, almost surely, for every 
n > 1, 



||W„+i(^)|| <c(^)n. 



( 12 ) 



Level Polynomial in BST 



237 



The result for p = 0 follows from this inequality. 

Suppose now p>l. 

Let first write a general result on complex numbers. Let x and y be two complex 
numbers, then 

lx + yfp - \xfP-^^x{x + 2 py)) = p‘^\yf\xf^~^ + 23?((xP + px^-^y)z) + zz 

where 2 ; is a polynomial whose coefficients are nonnegative real numbers and whose 
degree in x equals p — 2. Thus bounding above each term of the right part of the 
equation, we can say that there exists a polynomial P, whose coefficients are 
nonnegative real numbers and whose degree in the first variable equals 2p — 2, 
such that 

k + < |xpP“^IR(x(x + 2py)) + P(|x|, |y|) (13) 



Let us start the computation of E^’^(| < Z\Wn+\{z) > |^^): 



E=^"(| <Z|W„+i(z)> pP) 

= E < ^lWn(^) + zH,{z) > pP + 



Kk 

72+1 



< Z\Wn(z) + zH2{z) > pP. 



For every k we apply (13) to x =< Z\Wn{z) > and y =< Z\z^5i{z) >, afterwards 
to X =< Z|Wn(^) > and y =< Z\z^ 62 {z) >; notice that P is the same one the 
2n + 2 times we use (13). We obtain 



E^'-d < Z|W„+i(z) > pP) < ^ ^(| < Z\Wn[z) > 

3?( < Z\V^n{z) > (< Z\Wn{z) > +2p < Z\zf^5i{z) >)) 

+P(| < Z\Wn{z) > 1, 1 < Z\zH^{z) > D) 

+ ^(|<Z|W„(z)>|2P-2 

72 + 1 V 

3l( < Z\Wn{z) > (< Z\Wn{z) > +2p < Z\z'^d 2 iz) >)) 

+P(1 < Z\Wn{z) > I, I < Z\zH 2 {z) > D) 

< I < Z|W„(z) > |2p-20i( < Z\Wn{z) > < Z\{Id + -^A{z))Wn{z) >) 

+ E > I, I < Z\z'^S,{z) > I) 

k=l 

+ ^P{\ < Z\Wn(z) > I, I < Z\z'^52iz) > D) . 

72+1 

Then apply the above inequality to instead of Z, where /* denotes the 

adjoint endomorphism of / relative to < .|. >, for any eigenvalue A(z) of A{z) and 




238 



Eric Fekete 



take the expectation. 

E(l<Z|7rA(,)W„+i(2)>pP) 

^ (1 + ^^)E(|<Z|7t,(,)W„(^)>|2J’) 

n / -^A 

+ E®( < ^Ka(.)W„(z) > 1,1 < Z|zV;,(,),5l(^) > I) 

fc=l 

+ ^^^(1 < ^k;,(,)W„(2) > I, I < Z|z*^7rA(,)<52(z) > |)j. 

Denoting aij the coefficients of polynomial P, we have 

n 

E rXT^d < ^Ka(.)W„(z) > 1, 1 < Z1z'=7T,(.)5i(z) > I) 

fc=l 

n 2p—2 2p 

= E E E“*.^i < ^kAww„(z) > ri < ^|z^7t,(,),5i(z) > p 

k=l ^ i=0 j=Q 

2p-2 2p 

= EE ttij < Z|7Ta(^)W„(z) > PI < 

2 = 0 j = 0 



Z\ttx(^z)5i{z) > 



n + 1 



Assuming | 2 :| < 1, notice that ^ < 1 a.s., and use the non-negativity of the 

coefficients of polynomial P, so 

E(| < Zl7T,(,)W„+i(z) > \^n < (1 + ^^)E(| < Z|7T,(,)W„(z) > |2P) 
+E(P(| < Z|7 Ta(^)W„(z) > I, I < Z\^^x(^)Sl{z) > |)) 

+E(P(| < Z|7Ta(z)W„(z) > 1,1 < Z\tTx(z) 62 {z) > D), 
and by induction 



E(| < Z|7Ta(.)W„(z) > |2P) < C«„(2pa(z))(^E(| < Z|7 Ta(.)W„„(z) > 1^^) 
E(Q(| < Z|7TA(,)Wfe(z) > D) 



+ E 

/e=no + l 



C!^t\2pa{z)) 



(14) 



where no = 0 if A( 2 :) = and no = 4 if A(z) = ^ 2 {z) and where Q is a 

polynomial of degree 2p — 2. 

It remains now to be seen whether the series in (14) is convergent or not. Let 
us detail the end of the computation for A( 2 ;) = A 2 ( 2 :), it is analogous for Ai(z). 
Assuming the result for all integers < 2p, the order of magnitude of E((3(| < 
^kA(^)W/c(z) > D) is bounded above by the term of highest degree in Q, that 
is E(| < Z\7rx^z)^k(z) > 1^^“^), which is a (D{k^~^). Thus according to (9), 
C^+i(2p^2(^)) r\j k and thus the general term of the series is of size kP 
The series diverges and this gives the result for the even moments. 

For Xi{z) let us just say that the comparison between %{z) and | appears from 
the comparison between the expectation of Q{\ < Z\7Tx^(z)^k{z) > and 

C^+i(2pai(z)). 




Level Polynomial in BST 



239 



Using (12) the result for the odd moments is a consequence of the inequality 
E(| < Z|W„(z) > |2p+i) < E(| < Z\Wn{z) > X max I < Z|W„(z) > 1, 

where Q. is the underlying probability space defined as BinTree in section 1. □ 

One can notice that a precise computation can give us the actual size of 
E(1 < Z\'Kxi(z)^n{z) > 1^) and E(| < Z\'Kx 2 {z)^n{z) > P). Indeed for p = 1 the 
degree of P in the first variable is zero and we don’t have to add the assumption 
1^1 < 1- 

The lemma of moments is used taking (Zi, Z 2 ) an orthonormal basis of C^, 
so that we get 

||X||^ < I < ZijX > 1^ + 1 < Z 2 IX > 1^ 

for any integer p, with equality if p = 2. We can now prove our main result on 
W,(z). 



Proof of Theorem 1 

If 2 ; belongs to Bc(0,l) fl > |} then 2 : belongs to V, thus according to 

proposition 4.2 we have 



7TAi(z)W„(z) 

Cn(z) 



— ^oo(-2^)^(^) “h ^n{z^ 



where €n(z) 0 almost surely. Furthermore, using lemma 4.3 for p = 1, 

one can prove that €n{z) 0 in for all z G Bc(l, ^), thus for z G 

J5c(0, 1) n {R{z) > |}. Prom (8) we get 



Wn(z) 

Cn{z) 



-Moo(^)u(z) = en{z)-G 



7Tx^(z)^n{z) 

Cn{z) 



We denote in{z) — We have to prove that e^(z) 0 in and 

almost surely. 

We obtain the convergence with lemma 4.3 for p — 1 comparing the order 
of magnitude of E(| < Z\7rx2(z)^n(z) > p) and C^(z). 

Using Borel-Cantelli lemma one can prove that for any sequence of random 
variables {Xn)neni if for any s > 0, > ^) < 00 then Xn > 0 

almost surely. Furthermore the Markov inequality implies that it is sufficient to 
find an integer p > 1 such that is finite. 

Using lemma 4.3 we get 



which is the general term of a convergent series if p is large enough, according to 
the fact that 1 — 2<7i(z) < 0 which is equivalent with Ik(z) > |. □ 



References 

[ 1 ] B. Chauvin, M. Drmota, and J. Jabbour-Hattab. The profile of binary search trees. 
Ann. Appl Prob., 11:1042-1062, 2001. 

[2] B. Chauvin, T. Klein, J.F. Marckert, and A. Rouault. Martingales, Embedding and 
Tilting of Binary Trees. Preprint available at http: //fermat. math. uvsq.fr/^ marck- 
ert /papers, ht ml, September 2003. 




240 



Eric Fekete 



[3] B. Chauvin and N. Pouyanne. m-ary search trees when m > 26: a strong asymptotics 
for the space requirements, to appear in Random Structures and Algorithm 2004. 
Preprint available at http ://fer mat. mat h.uvsq.fr/^ chauvin. 

[4] F. M. Dekking, S. De Graaf, and L. E. Meester. On the node structure of binary 
search trees. Trends in Mathematics. Birkhduser Verlag, Basel, pages 31-40, 2000. 

[5] F. M Dekking and L. E. Meester, An almost sure result for path lengths in binary 
search trees. Adv. AppL Prob., 35:363-376, 2003. 

[6] J. Jabbour-Hattab. Martingales and large deviations for binary search trees. Random 
Structure and Algorithms, 19:112-127, 2001. 

[7] H. Mahmoud. Evolution of Random Search Trees. John Wiley, New York, 1992. 

[8] H. M. Mahmoud. The expected distribution of degrees in random binary search trees. 
Comput. J., 29:36-37, 1986. 

[9] E. C. Titchmarsh. Theory of functions. Oxford university press, 1986 second edition. 

Eric Fekete 

LAMA Laboratoire de Mathematiques, UMR 8100, Batiment Fermat, Universite 

de Versailles Saint Quentin F- 78035 Versailles. 

fekete@math.uvsq.fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Random Records and Cuttings in Complete 
Binary Trees 

Svante Janson 



ABSTRACT: We study the number of records in a complete binary tree 
with randomly labeled vertices or edges. Equivalently we may study the number 
of random cuttings required to eliminate a complete binary tree. 

The distribution is, after normalization, asymptotically a periodic function of 
Ign — lg Ig n; thus there is no true asymptotic distribution but a family of limits of 
different subsequences; these limits are similar to a 1-stable distribution but have 
some periodic fluctuations. 



1. Introduction 

Let each vertex in a rooted tree T have a random value attached to it, and 
assume that these values are i.i.d. with a continuous distribution (so that a.s. there 
are no ties). Say that a value is a record if it is the smallest value in the path 
from the root to v. Let Xy{T) denote the (random) number of records. Note that 
this generalizes the classical record problem (which is the case when T is a path) , 
see for example [9]. 

Alternatively, we may attach random values to the edges, and let Xe(T) 
denote the number of edges with record values (along the path from the root). 

It is obvious that the choice of common distribution of the labels does not 
affect the result, and that we as well can count the values that are largest. We can 
also let the labels be a random permutation of {1, . . . , n}. 

The same random variables appears when we consider random cuttings of 
the tree T defined as follows, see [6]. Make a random cut by choosing one vertex 
[edge] at random. Delete this vertex [edge] so that the tree separates into two 
parts, and keep only the part containing the root. Continue recursively until the 
root is cut [only the root is left]. Then the (random) number of cuts made is Xy{T) 
[Xe{T)]. (More precisely, these random variables have the same distribution.) This 
equivalence is shown in [3], where the asymptotic distributions are found for the 
random trees that can be constructed as conditioned Galton-Watson trees, for 
example random labelled trees and and random binary trees. See also [6, 1, 7, 8] 
for earlier results. 

We will in this paper study the case of a complete binary tree. 

The complete binary tree with n vertices has height m = [IgnJ; it has 2^ 
vertices of height fc, 0 < fc < m, and n - 2"^ + 1 vertices of height m, moreover, 
the vertices of height m have the leftmost positions among the 2'^ possible ones, 
see e.g. [5, page 401]. We denote this rooted tree by Tn, and denote its root by o. 

Let {x} := X — [x\ denote the fractional part of a real number x. Further, 
for a vertex v in a rooted tree, let h{v) be its height (also known as depth), with 
the root having height 0. 




242 



Svante Janson 



Theorem 1.1. Suppose that n — v cx) such that {Ig n — Ig Ig n} —>76 [0, 1] . Then 

nlglgn\ / n d 

Ig 









Ig n 'I Ig n 



-W-, 



( 1 ) 



where Wry has an infinitely divisible distribution with characteristic functior 



exp(if{'y)t + J — 1 — itxl[x < 1 ]) dury{x)^^ ( 2 ) 

where /{'y) := 2^ — 1—y and the Levy measure Uj is supported on (0, oo) and has 
density 

dx 

The same result holds for Xe(Tn). 



We prove Theorem 1.1 in Section 2. The strategy of the proof is to approxi- 
mate Xy{Tn) and Xe{Tn) by a sum of independent random variables derived from 
{A^}, see Lemma 2.4. We will then apply a classical limit theorem for triangular 
arrays. 



Remark 1.1. Let X^ denote the left hand side of (1). Instead of stating the result 

for suitable subsequences, we may say that Xn has approximatively the same dis- 
tribution as — for large n; more precisely, the distance between the 

two distributions (in for example the Levy metric) tends to 0 as n oo. 

Remark 1.2. Most records occur at height close to the maximum m^lgn, simply 
because almost all vertices are there. On the other hand, it follows from the proof 
below, see Lemma 2.4 and the proof of Lemma 2.5, that most of the random fluc- 
tuations of Xy{Tn) or Xe{Tn) Can be explained by the values at heights close to 
Iglgn. The explanation is that a few values Xy at these heights will be so small that 
they significantly reduce the number of records among their descendants. Vertices 
of smaller height are too few, and there will usually not be any sufficiently small 
value among them, while vertices of larger height affect only a small proportion of 
the tree each, and the random effect caused by their values will be wiped out by the 
law of large numbers. 

Remark 1.3. It is easy to see [3] thatEXy{Tn) = l/{h{v)-\-l) andEXe(T^) = 

^ sums are easily evaluated as n/m + 0{n/m?) = n/lgn + 

0(n/lg^n). We see thus from Theorem 1.1 that Xy{Tn) and Xe{Tn) are con- 
centrated well above their means (at a distance of about nlglgn/ Ig^ n), so that 
e.g. F[Xy{Tn) < EX-y(T^)) 0. This is connected to the fact that the limit Wj 

has infinite mean. 

Note also thatEXe{Tn)-EXy{Tn) = + ~ n/lg^n, 

while there is no similar difference in the limit distribution in Theorem 1.1. 

An explanation of these facts is that the mean is affected by the unlikely event 
that a vertex close to the root has an extremely small value Xy, which would reduce 
the number of records by a large amount. 

We see that this behaviour makes it impossible to use the method of moments 
to find the asymptotic distribution in Theorem 1.1, as we did for other trees in [3]. 

Remark 1.4. Recall that the Levy measure cx~‘^ dx gives a (weakly) 1-stable dis- 
tribution, see e.g. [2, XVII.3]; the measure Uj is a version of this with periodic 
fluctuations, so the distribution ofWj is roughly similar to a 1-stable distribution. 




Random records and cuttings in binary trees 



243 



More precisely, we have that if and W!^ are independent with the same distribu- 
tion, then = 2 W^ + 2 , as is easily checked from {2), but the corresponding 

statement for a sum of three copies of is false. 

If we write (2) as ^ a is possible to compute the Fourier 

coefficients of 'iljj{t) as a function of 7 by integrations, using Fubini and some 
Gamma integrals, and obtain 






-«E 

n/0 



r(27rm/ln2-l) __^2„^i^„^^ 

In 2 — 27rm 



In 2 1^|— 27rin/ In 2^27rni7 



where 7 * is Euler ^s constant. We omit the details. This, again, shows the affinity 
with stable distributions. 



The complete binary tree Tn has minimal height among all binary trees with 
n vertices, but among binary trees with this height, it is maximally unbalanced. 
The other extreme is the balanced binary tree T*, where at each vertex, the two 
subtrees emanating from it differ in size by at most 1 . This tree too has height 
m = [IgnJ, and the same number of vertices at each level as T^. As a companion 
to Theorem 1.1, we give a similar theorem for T*; note that the results are similar 
but not identical, which shows that the details of the structure of the tree are 
important. (If we consider only n of the form 2^ — 1, = T is a full binary tree. 

Indeed, Theorems 1.1 and 1.2 yield the same result in this case.) In contrast, note 
that the means of Xy and Xe are the same for Tn and T*, see Remark 1.3. 



Theorem 1.2. Suppose that n ^ 00 such that {Iglgn} ^ /3 G [0, 1]. Then 

n nlglgn\ / n a 

Ign 






Ig^n 



-)/ 



Ig^n 



-m-p 



where W\-p is as in Theorem 1.1. The same result holds for Xe(T*). 



The method used below applies also to other binary trees with minimal 
height, but we leave the details to the reader. Presumably, the method can be 
used also for a larger class of binary trees, but we have not explored this. In par- 
ticular, we do not know whether our methods can be used to solve the following 
problem. 



Problem 1 . 1 . What is the asymptotic distribution of Xy and Xe for a (random) 
binary search tree? 



2. Proofs 

We first treat the case Xy of Theorem 1.1 in detail, and then indicate the small 
modifications needed for Xe and for T*. 

Let Xn := XyiTn)^ and let, for y > 0, Xn^y be Xy(Tn) — 1 conditioned on 
the root label Ao = y, i.e. the number of records in the rest of the tree if we fix 
the root label (which always is a record). 

We will use the notations m := [IgnJ (as above) and I := [IglgnJ; we also 
let L := [| IglgnJ 31/2. We assume that n is so large that 0 < I < L < m.lf an 

are positive numbers and Zn random variables such that Zn/an 0 as n — > 00 , 
we write Zn = Op(a^). 




244 



Svante Janson 



In the sequel, we will write T instead of Tn- For a vertex G T, we let Ty be 
the subtree of T rooted at u, and let Uy be the number of vertices in Ty. 

For later use we note that if we fix j < m and consider the 2^ vertices of height 
labelling them ui, . . . , V 23 from left to right, then, with qj := [(n— , 

Uy^ = i 2^-^' - 1 + 2^-^{(n - 2^ + i = Qj + 1, (3) 

(2"^-^-l, qj-\-l<i< 2 ^. 

We will further assume that the labels Xy have an exponential distribution 
Exp(l) with mean 1; as remarked above, this does not affect the distribution of 

^n. 

Lemma 2.1. We have 

n — 0'^ 4 - 1 rn—l cym—k 

¥.Xn,y = — (1 - e~”^y) + T (1 - (4) 

and, uniformly in n and y > 0, 

VarXn,2/ == 0 {m~^n^). 

Proof: Fix y > 0 , and let, for each vertex v eT, ly be the indicator that A^; is a 
minimum, given that A^ = y. Thus, Xn,y = Y^y^o ^('^) o,vi, . . . , Vj = 

V be the vertices on the path from the root o to u. Then ly = lif and only if A^^. < y 
and A^. > A^^. for z = 1, . . . , j - 1. Hence, since A^;. ^ Exp(l) are independent, 

rv ^ py . 1 — e~^y 

EIy= T\F{Xy,>x)e-^dx= dx = ^ . ( 5 ) 

•Jo ^ tJ 0 J 



Consequently, 



m— 1 



EX 



n,y 



= E2' 

j = l 



, 1 - e~^y 



+ (n-2^ + l) 



1 - e~^y 



m 



proving (4) by letting j = m — k. 

To estimate the variance, assume that v and w are two vertices in T of heights 
j = h{v) and k = h(w), and with their last common ancestor u at height i. 

Suppose first i < j and i < k. Let uq = o,ui, . . . ,Ui = u he the vertices on 
the path from o to u, and let Z := min{A^^^ : 1 < s < i}. Conditioned on Z, ly 
and Ijy are independent. Further, since v has height j — i above u, (5) yields 

1 _ p-U-i)(ZAy) 

E{ly I Z) = , 

J 




Random records and cuttings in binary trees 



245 



and similarly for Consequently, since Z ~ Exp(z ^), being the minimum of i 
independent Exp(l) variables, 









J 



j — ik — i 



= - - (1 - e~^y) - 7 (1 - 

1 —ik — 1\ 7 k^ ^ 



k — i J 

' (1 - (1 - dz 

+ e“*^(l-e 






( 6 ) 



+ T 



j -\-k — i 



.(1 +, 



-w _ ^-jy _ ^-{j+k-i)y 






Say that the pair (u, w) is good if i < m/3 and j, k > 2mf3^ and bad otherwise. 
For a good pair {v,w) we have, by (6) and (5), 



Cov(/^,/^^) = Elyl^ -ElyEIu 



1 + 0{i/m) 
jk 






-jy _ p-^y 



+ e 



-{j-^k-i)y 



+ 






(7) 



= - e-^y) + 0{i/m^) 

= 0{m~^e~'^'^iy) + 0{i/m^) = 0{i/m^). 



For given /c, there are at most 2* choices of u and then at most 2-^“^ choices of 
V and 2^“* of w] thus the total number of such pairs is at most Hence (7) 

yields 

m m m 

cov(/„,/.) = o(^5;x; ^im = 0(2^^m (8) 

good {Vyw) i=l j=l k=l 

The total number of bad pairs is at most 



i>m/3, j,k<m 



+2 E 

J<2m/3, k<m 






(9) 



For the bad pairs we simply use Cov{Iy,Iyj) < Elylyj < 1, and obtain from (8) 
and (9) 

VarX„,3, = '^Cov{h,I^) = 0{2^"^m-^) = O(m-V). 



Let ^(n, y) := EX^,y, given by (4). In the next lemma, we find it useful to 
be slightly more general than simply requiring fh = m. 

Lemma 2.2. // 2^ - 1 < n < 2^+^ - 1, then 

n - 

(^(n,y) = — (l — H — + 0(m"^e“'^^/^n + fh~^n). 




246 



Svante Janson 



Proof: Let 



r\m—k 



CLk 



m — k 



(1 









jm—k 



m 






^-my _ (k-m)y 






For m/2 < k <mwe use = 0(2’^/^), and for k < m/2 

Summing over k, we see that (4) yields, using = 0(1), 

^{n, y) = (1 - e~^y) + — (l + - + 0{m-^)) (l - 6“™^) 

m ^ ^ m \ m J ^ ^ 

/ 2’^ \ 

^ om+l 

= -(1 - e-^^) + +0(m-^n) + 0(m-2ne-^^/^). 

This proves the result when fh = m. The only remaining case is n = 2^^ — 1 and 
m = m — 1; the result follows easily in this case too, for example by adding a vertex 
V at height m, using the case just considered, and subtracting E = (1 — 
from (5). 

Recall that L = L|lglgnJ |lgm. Let Vi, 1 < i < 2^, be the 2^ 

vertices of height L, and let rii := riy.. Note that rii = 0(n/2^). Further, let Yi be 
the minimum of along the path P{vi) = o. . .Vi from the root to Vi. 



Lemma 2.3. With notations as above, 

2^ 

Xn = ^ Yi) + Op(m~^n). (10) 

i=l 

Proof: We write the number of records Xn as F* + Vi H h V 2 L , where V* is the 

number of records with height < L and Vi are the number of records in Ty. \ {u^}. 
If we condition on {A^ : h{v) < L}, then V* and all Yi become fixed, while 

Vii 1 < ^ < 2^, become independent random variables with Vi = X^Xi' 

Let 3'l be the <j-field generated by {A^; : h{v) < L}. Then, by the comments 
just made, E(Vi | Tl) = E(AT^.,Y'. | Yi) = ip{ni,Yi) and, with m^ := [Ign^J = 
Ign - L + 0(1) m, 



2^ 2^ 

= j]E((y, - ip(ni,Yi)f I = j]Var(X„,,y, | y) 






Yoi 

i=l 






0{2^m-^2-‘^^n^) = 0{nr^2-^n^) 



= 0(m-9/V). 




Random records and cuttings in binary trees 



247 



Taking the expectation, we find 

2^ 2 



1=1 



and thus 



X„-V* -j2<p(ni,Yi) = Op(m ^n). 



i=l 



The result follows because also 

0<V* <2^+^ = = o(m-^n). □ 

Next, let m := m — L ~ m. By (3) (with j = L), we can apply Lemma 2.2 to 
each rii and m; this yields 



r%m-\-l 



ip{rii,Yi) = ^(l- e + _ -- o - + 0(m -f m ^ni). 

Since Yi ^ Exp(l/(L + 1)), for every a > 0, 

Ee- = + 



Hence 



E\t 



-mYi _ ^-mYi \ _ 



T + 1 



L + 1 



It follows easily ft-om (11), (12) and (13) that 

2^+1 



L+l+m L+l+m 

L^\ 



( 11 ) 

( 12 ) 

(13) 



m?) 



E 



i ni 

V\rii,Yi) - — H e 

m m 



^ ^-mYi 



0[L^m ^rii) = o{m ^n^). 



Summing over i we find, using Lemma 2.3, since rii = n — (2^ — 1) = n — 

0(m^/^), 



= ^(^ - — e + ^ 3 ^) + Op{m ^n) 
^\m m m^J 

i=l 



n 



m — L m 



1 ? om— L+l 

- Y, + 2 \_ + Op{m-^n) 



i=l 



(m — Ly 



(14) 



m m 

i=l 



+ Op(m ^n). 



We transform this once more. 

Lemma 2.4. 

71 71 1 — mA 

Xn = h L — 2^ H ^ 

m 771^ m ^ 7X1^ 

h{v)<L 



mXv _L f ^ Op{m ^n). 



(15) 




248 



Svante Janson 



Proof: We recall that each Yi is the minimum of the L + 1 independent variables 
A^, V G P{vi)\ thus is the maximum of the corresponding . Let a = 

2 In m/m. The probability that at least two A^, u E P{vi), are less than a is 
0{Li^a^) = 0 (ln^m/m^); hence the probability that this happens for some i is 
0(2^1n^m/m^) = o(l). With probability tending to 1, there is thus at most one 
Xy less than a in each P(ui), and in this case, 



and thus, 



because Uy 



0 < = L/rr?, 

vGP(vi) 



2 ^ 2 ^ 

+0{nL/m^) 

i=l i=l veP{vi) 

= ^ ^ ni + 0(nl//m^) 

h{v)<L i:vGP{vi) 

= ^ e~^^^riy + 0{nL/rn^)^ 

h{v)<L 

2^ < Ei:yeP{v,) Hence, 



2 ^ 

y^nje~'^^^ - e~'^^^Uy -\- Op{n/m), 

i=l h{v)<L 



and the result follows from (14). 

The sum in (15) is a sum of independent random variables. The proof will be 
completed by a classical result on convergence of such sums for triangular arrays 
to infinitely divisible distributions, see e.g. [4, Theorem 15.28]. 

We write, for convenience, ^y := ^ further write {Ign} 

and Pn = {Iglgn}; Thus Ign = m + and Igm = Iglgn + o(l) =/ + /? + o(l). 
We then have, by Lemma 2.4, 

m^/ n nlglgn\ 

n Ign Ig^n ) 



= (—-/—)+ L - m ^ 

\m Ign/ 



h{v)<L 



^ om+l 

Iglgn + Op(^^g) 



— an L — I — Pn + Cv + Op{l). 

h{v)<L 



Since m/lgn — ^ 1 , it is thus enough to show that this converges in distribution to 
—W^ as n ^ 00 with {Ign — Iglgn} ^ 7 . 

By considering subsequences, we may assume that the limits a := lima^ and 
P := \\mPn exist. Thus Ign == m + o; + o(l) and Igm = lglgn + o(l) = Z + /3 + o(l). 
Note that Ign — lglgn = m — / + a — /? + o(l); thus 7 = a — /3 (mod 1) and more 
precisely. 



a - p \{ a> p\ 
a — P -\- 1 \{ a < P\ 

0 or 1 \i a = p. 



( 17 ) 




Random records and cuttings in binary trees 



249 



Lemma 2.5, Suppose that n oo such that an ^ a and (3n l3 for some a and 
13 in [0, 1], and let h := Then 

(i) sup^ P(^^; > x) ^ 0 for every x > 0. (I.e., {^y} form a null array.) 

(ii) 12h{v)<L > a;) -*■ oo) for every a: > 0. 

(iii) Hh{v)<L <h])-{L-l + 2^~°‘ + a - /3) /3 - a. 

(iv) S/»(u)<z, — ^]) 3ft/2. 

Before proving this lemma, we show how it implies Theorem 1.1. Let C := 
L — I 2^”^ a — (3. We apply [4, Theorem 15.28] with a = 0 and b = /(y) to 

X)/i(t;)<L^^ ~ —C/n deterministic. (Note that Cjn 0; thus 

{^^}U{^'} is a null array.) We have dv^/dx = when 

2'^h < X < and thus 




3h 

T‘ 



Similarly, if^<asol/2</i<l, then 



f xdu^{x)= ( 2"^ ^ dx = 2^ ^ 

Jh Jh 

while if > a so 1 < h < 2, then 

J xdi'^{x) ~ ~ xdu^{x) = 2^+^“^(l — h) = 2{2^~^ — 1). 

It follows, using (17) and /(O) = /(I), that in both cases 

fi'y)— [ xdi/^{x) = 2^^ — 1 — j — f X dur^{x) = (3 — a. 

Jh Jh 

It is now easy to see from Lemma 2.5 that the conditions of [4, Theorem 15.28] 
are satisfied, and consequently 

n 

^.-(L-/ + 2i-“ + a-/3)= Y + 

h{v)<L h{v)<L i=l 

Theorem 1.1 now follows by (16). 

Proof: [Proof of Lemma 2.5] For any a: > 0, 



> a:) = 









/I muy\ 

= 1 — exp ln_L ) 

\ m nx / 



This shows first that for every a: > 0, 



X 1 , mny 1 , m 

P(^^; > x) < — ln+ < — ln+ — 

m nx m X 



which proves (i). 

On a given level j < m there are, by (3), qj = n2^~'^ — 2^ + 0(1) = (2^^ — 
1)2^' -h 0(1) vertices with Uy = - 1, and V -qj -1^(2- 2^-)2^‘ + 0(1) 

vertices with ny = — 1. There is one additional vertex with an intermediate 

Uy (which could coincide with one of the two main values); for convenience we 




250 



Svante Janson 



call such a vertex bad. We also call a vertex v with Uy > (which requires 

j < //2) bad. All other vertices v with h{v) < L are good. The good vertices thus 
have riy = 2'^~^ — 1 for some k with 1/2 < k < L. For 1/2 < k < L, there are 
(2— 2‘^"^)2^+0(l) such vertices with = fc and (2^^ — 1)2^“^^ +0(1) with h{v) = 
fc + 1; thus together 2^+"^ + 0(1). For k = L, there are only (2 — 2^"^)2^ + 0(1) 
such vertices, since we require h{v) < L. In other words, 



#{v good : riy = 2"^ ^ 



|2fc+a. +0(1), l/2<k<L, 

> \(2-2^-)2^ + 0(l), k = L. 



( 20 ) 



The number of bad vertices is 0(L + 2^/^) = 0(m^/^). By (19), F{^y > 
x) = O (In m/m) for every fixed x > 0. Hence the sum over bad vertices in (ii) is 
0(m“^/^ In m) = o(l). 

Similarly, using (19) again, 

1 . 1 21nm ^/lnm\ . . 

E < - + /iP > -) <- + h = 0 ). 21 

^ ^ m V m/ m m \ m / 

and 

Var+l[^« < h]) < < h]) = (22) 

Consequently, the sum over bad is o(l) in (ii), (iii) and (iv), so we may in 
the sequel ignore them and consider only good vertices. 

Fix X > 0. Then, by (18) and (19), 

P«„>x) = il„,(^)(l + o(!!l=)) (23) 

m \ nx / \ \ m J J 

If fc > L, then m2^~^ < 2^+^+’^“^ < nx, provided n is large enough. Thus, for 
large n, by (20) and (23), with all o(l) uniform in fc for fixed x, 

X: r({„ > X) = (1 +o(D) f: (2^^-. + 0(1))1 J 

V good k=l/2 

= (l + o(l)) ^ +o(l) 

k>ll2 

= (l + o(l)) 2-*+"-^ln+(2*-“+^+°(^)x-i) +o(l) 

i<ll2 

oo 

F{x) := y]2-*+““^ln+(2*-“+^x-^). 



— OO 




Random records and cuttings in binary trees 



251 



Let j := [Igx -f o — /3J; thus 2^“^^ ^ < x < 2^^^ and 



F{x)= 2-*+“-^ln(2 






= Y 2-^-^+“-^ (A: In 2 + ln{2^-^+^x-^)) 

k=l 

= 2--^+“-^(2 + lg ( 2 -?-“+^ x - i )) ln 2 
^ 2«-/3-Ligx+a-/3j (2 - {igx + a - p}) In 2 
= 2T'-L'8 *+t'J (2 - {lg:c + 7» In 2. 

Note that F{x) is continuous and decreasing with F{x) —f 0 as a; — > oo. The 
derivative is 

_ _l2'^“bga:+7j — _^-22{lgrr+7}^ 
dx X 

Thus F{x) = Uj{x,oo), which proves (ii). 

For (iii) and (iv) we calculate, for s > 0, 

poo 

E(e-mA„i^g-mA„ < ^-1]^ ^ e~^ dx 



— ^ ln+ 5 _ 

m + 1 m + 1 



and, similarly. 



^ 2m + 1 



which gives 



Var(e-^^n[e-"^^- < ^“^]) = {l + 0{m~^)). (25) 



If is a good vertex with riy = 2'^ ^ — 1 = 2’^ k+o(i)^ 



'^'^v _ 2(^+/5) + (m-fc)-(m+a)-(/3— a)+o(l) _ 2^-^+o(l) 

nh ’ 



and thus, by (24), 



< h]) = 



^-rriA, < 



(m + l)n 

Note in particular that if fc > Z + 1, then < 1 for large n, and thus ^ h and 



< h]) = ^ = 2- 

^ ^ (m + l)n 



(l + o(i)). (26) 




252 



Svante Janson 



It follows from (20), (24) and (26) that, with o and O uniform in fc, 

Y. < ft]) 

V good 

I L-1 

= Y + Y (2''+“" +0(l))2“*'~“"(l + 0(m“^)) 

k=l/2 k=l+l 

+ ((2 - 2“")2^ + 0(l))2“-^-“+‘’(i) 

I L-1 

= Y + Y (1 + C»(m-1)) +2^-“-l + o(l) 

k=l/2 k=l-hl 

= 2 -h L - 1 - Z + 2^-^ - 1 4- o(l) = L - / 4- 2^"^ 4- o(l). 



Similarly, using (25), 



Var(^^[^„ < /ij) = Var(e-"^^n 



^—mXv 



< 



nh 1 



mrio, 



2 2 
m^n 



2m'n? 



^ 2 ~( 2 + 1 /^)(^~^) ++o( 1) 2a— 2(Z — fc)-|_+o(l) 



and 



Y Var(^^[^„ < ft]) == Y 2''+“+°(^)2'+^-i-2fc-2a-2(z-fe)++o(i) 

V good k=lj2 

oo 

_ ^ 2^-^-2(Z-fc)++/3-a-l+o(l) 

k= — oo 

= 3 • + o(l) = 3h/2 + o(l). 



This completes the proof of Lemma 2.5. 

We have proved Theorem 1.1 for For Xg, the only difference is that Ag is 
ignored, and thus ~ Exp(l/L). The estimates in (12) and (13) remain valid, and 

thus (15) and (16) still hold, summing over v ^ o only. Since 0 by 

Lemma 2.5(i), this makes no difference for the asymptotics of the distribution. (But 
note that E^o 1? and that the means differ correspondingly, see Remark 1.3.) 

For the completely balanced tree (Theorem 1.2), every vertex v with h{v) = k 
has 2~^n — 2 < riy < 2~^n. We call all vertices with 1/2 < h{v) < L good, and 
replace (20) by 

#{ueT^ good; n^ = 2-'=n + 0(1)} = 2^ I < k < L. (27) 

The remaining calculations hold as above, provided we replace an and o by 0 and 
thus 7 by 1 — /3. 



References 

[1] P. Chassaing Sz R. Marchand. In preparation. 

[2] W. Feller, An Introduction to Probability Theory and Its Applications. Vol. II. Second 
edition, Wiley, New York 1971. 

[3] S. Janson, Random cutting and records in deterministic and random trees. Preprint, 
2003. Available from http : //www .math . uu . se/~ svante/papers 




Random records and cuttings in binary trees 



253 



[4] O. Kallenberg, Foundations of Modern Probability. 2nd ed., Springer- Verlag, New 
York, 2002. 

[5] D.E. Knuth, The Art of Computer Programming. Vol. 1: Fundamental Algorithms. 
3nd ed., Addison- Wesley, Reading, Mass., 1997. 

[6] A. Meir & J.W. Moon, Cutting down random trees. J. Australian Math. Soc. 11 
(1970), 313-324. 

[7] A. Panholzer, Cutting down very simple trees. Preprint, 2003. 

[8] A. Panholzer, Non-crossing trees revisited: cutting down and spanning subtrees. Pro- 
ceedings, Discrete Random Walks 2003, Cyril Banderier and Christian Krattenthaler, 
Eds., Discr. Math. Theor. Comput. Sci. AC (2003), 265-276. 

[9] A. Renyi, (1962). On the extreme elements of observations. MTA III, Oszt. Kozl. 12 
(1962) 105-121. Reprinted in Collected Works, Vol III, pp. 50-66, Al^idemiai Kiado, 
Budapest, 1976. 

Svante Janson 

Department of Mathematics, Uppsala University, PO Box 480, S-751 06 Uppsala, 

Sweden 

Email: svante.janson@math.uu.se 

http : . math . uu . se/'^svante/ 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Multidimensional Interval Trees 

Mehri Javanian and Mohammad Q. Vahidi-Asl 



ABSTRACT; The binary interval tree is a random structure that underlies 
interval division and parking problems. A generalization to trees underlying volume 
partition is investigated in this paper; the size of the associated tree is studied. In 
d dimensions (arbitrarily high), the moment generating function of the size of the 
tree and several pruned variants is shown to satisfy a partial differential equation 
of order d. So, the situation is considerably more complex than the linear case 
that arises in one dimension. The paper addresses volume partition by points, and 
the case can be solved, in contrast with a well-known parking problem, where 
the partition is done by solid objects; the parking analog remains unsolved, but a 
variation is shown to be tractable, which may glean research toward an eventual 
solution to the standard parking problem may. The multidimensional interval tree 
can be viewed as a continuous analog of the discrete quadtrees. 



1. Introduction 

The interval tree is a tree associated with repeated division of an interval of 
volume xi . . .Xd until it is partitioned into parts of volume less than 1. 

Suppose each xi is at least 1. The volume Xi .. .Xd is represented by the root 
(a distinguished internal node) of a 2^-branching tree. Divide the interval into 
2^ random orthants by choosing Q = (Ui, . . . , Ud) uniformly at random from the 
interval. Let us cannonically label the 2^ orthants in any arbitrary way. The ith 
subtree is then associated with the ith orthant. Each subtree grows recursively on 
the volume of the interval it represents. The process continues in the 2^ subtrees 
all the way to the leaves where a leaf stands for an interval with volume less than 
1 . 

The binary interval tree has been introduced in Sibuya and Itoh (1987). 
Recently, Itoh and Mahmoud (2002) considered incomplete or one-sided variants of 
the (1-dimensional) interval tree. In d dimensions, the interval tree has 2^ subtrees. 
We consider here a few such incomplete variants (corner preference, proportionate 
preference and no preference) to show that the techniques can be extended. 



2 . Main results 

In an interval tree. Suppose at each node, all the 2^ subtrees are pruned, except 
one specified by a prescribed program. More precisely, let Jo? • • • be any given 
pruning sequence of independent random variables, each with an arbitrary discrete 
distribution on the set of integers {1, ... 2^}. At the root node, all the subtrees are 
pruned, except the Jgth subtree. At the root of the Jgth subtree, all the subtrees 
are pruned, except the Jist, and so forth. We call the tree so obtained an incomplete 
interval tree grown from the pruning sequence Jo, Ji, — 




256 



Mehri Javanian and Mohammad Q. Vahidi-Asl 



For example the deterministic sequence 1,1,1,... corresponds to the consis- 
tent choice of the leftmost subtree (corner preference). The pruning process leaves 
behind the leftmost path (corner preference) in the full tree. All of paths are de- 
termined by keeping a random subtree and pruning all the rest. We are interested 
in the size Sy{Jo, Ji, . . .) of the incomplete interval tree, grown under the pruning 
sequence Jq, Ji, . . . . It is sufficient for the purpose to study the length of the path 
of the corner preference incomplete tree Sy = 5^(1, 1,...), because ( Jq, Ji, . . .) 
and S'v(l, 1, • • •) have the same distribution, owing to the symmetry of the 2^ sub- 
trees. We start from a stochastic recurrence for the size of the corner preference 
incomplete tree Sy. 

Sy = 1 + Sy^ , for > 1, 

where V\ = Ui .. .Ud is the volume of the first orthant. The boundary conditions 
are *?i = 1, and = 0 if F < 1. Let (fy{t) be the moment generating function 
ofSy. 



Theorem 2.1. Let Sy be the size of a random corner preference interval tree grown 
on the interval of volume V. As V — > oo, 



Q In V 
-d~ 






Corollary 2.2. 



E[Sy] ^ 

Var[Sy] - 



InF 

~1T 

InV 

(P 



We have derived the similar results in proportionate and uniform preference 
incomplete interval trees. 



References 

[1] Sibuya, M. and Itoh, Y. (1987). Random sequential bisection and its associated 
binary tree. Annals of the Institute of Statistical Mathematics , 39, 69-84. 

[2] Itoh, Y. and Mahmoud, H. (2003-h). One-sided variations on interval trees. Journal 
of Applied Probability (accepted). 

Mehri Javanian and Mohammad Q. Vahidi-Asl 

Department of Statistics, Shahid Beheshti University, Tehran, Iran 
javanian_ m@yahoo.com 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Edit Distance between Unlabelled 
Ordered Trees 

Anne Micheli and Dominique Rossin 



1. Definitions 

1.1. Sorted permutations 

We introduce sorted permutations [1, 2] and show that they are in one-to-one 
correspondence with ordered trees. 

Definition 1.1. Let n e N, a sorted permutation of is a permutation 

a such that a = InJ where I and J are sorted permutations on and 

{p -h 1 . . . n — 1} respectively. Notice that I or J could be empty. 

Theorem 1.2. Sorted permutations are in one-to-one correspondence with rooted 
ordered trees. The numbering of the edges corresponds to a postfix Depth First 
Traversal (DFT). Then the sorted permutation is obtained by a prefix Depth First 
Traversal of the labeled tree. 

If cr is a sorted permutation , 7{a) denotes the tree associated to a. 

Definition 1.3. A subsequence of a permutation a — (j\. . is a word g' — 
Gi^ . . . Gi^ where Zi, . . . , is an increasing sequence o/ {1, . . . , n}. Let $ be the 
bijective mapping of {gi^ , cr^ 2 , • • . , gi^ } on {1, . . . , /c} preserving the order on g^ . 
The normalized subsequence (pattern) is equal to ^{g'). 

Remark 1.4. The sorted permutations are the permutations avoiding the normal- 
ized subsequence (pattern) 231 [3]. 

1.2. Edit distance 

Given two unlabeled trees, the edit distance is the minimal number of operations 
to transform one into the other. The operations are Deletion: this is the contraction 
of an edge; two vertices are merged. Only one label is kept. Insertion: this is the 
converse operation of deletion. 



2. Distance on sorted permutations 

A factor of a permutation g = giG 2> . - CFn is a factor of the word g\G2. . .Gn^ i.e., 
a word of the form GkGk-^i . . .Gk-^i. A compact factor / is a factor such that its 
elements are a permutation of an interval of N. A complete factor of <j is a compact 
factor / such that there is no non-empty factor g verifying that fg is compact and 
the greatest element of fg is equal to the greatest element of /. 

Take for example the sorted permutation g = (1524376). The complete 
factors of cr are {!}, {15243}, {1524376}, {5243}, {524376}, {2}, {243}, {43}, {3}, 
{76}, {6}. 




258 



Anne Micheli and Dominique Rossin 



Let cr = <Ji . . . (7/e be a word of {1 . . . n} and a be a letter of {1 . . . n). We 

denote by fcrla the word (Ji . . . (j 1 where a'- _ / ^ ^ 

[ai + 1 otherwise 

Definition 2.1. We define two operations on a: 

1. Deletion: Let I <k <n. The deletion {k A) is the removal of ak in a 
and the renormalization on Sn-i of the result. 

2. Insertion: (A 0) corresponds to the transformation of a = 0 into 
g' = [\). If g ^ 0, let f be a complete factor of g. Then, g — ufv with 
u,v factors of g. The resulting permutation is g' : 

(a) (A /); G^ = [u\aCtf[v]a, a = max{f} -h L This corresponds to the 

insertion of an inner edge with T{f) as subtree. 

(b) (A ^ /); g' = [u]afcL[v]a, a = max{f} + 1. This corresponds to the 
insertion of a free edge as the right brother of7{f). 

(c) (A /); g' = [u]aa[f]a[v]a, a = min{f}. This corresponds to the 
insertion of a free edge as the left brother of7{f). 

Proposition 2.2. The Deletion/Insertion algorithm yields a sorted permutation. 
Moreover insertion and deletion are inverse operations. 

Theorem 2.3. The edit distance between two sorted permutations gi and G2 is the 
edit distance between the associated ordered trees and is equal |(Ji| + |<J 2 | — 2|u| 
where u is a largest normalized subsequence (pattern) of gi and <72- 

Corollary 2.4. Finding the greatest common pattern between two sorted permuta- 
tions is polynomial. 

In [4], they proved that finding the greatest common pattern between two 
permutations is NP-complete. We prove here that the problem becomes polynomial 
when restricting to sorted permutations, ie (132) or (231) avoiding permutations. 
In fact, the algorithm of Zhang and Shasha [5] on trees solves the problem on 
sorted permutations because the algorithm outputs not only the distance but also 
the greatest common subtree. 



3. Generating function of the edit distance between sorted 
permutations and Id = 1 2 ... n 

We denote by Si{t,q) the generating function of sorted permutations where t 
counts the size of the permutation and q the edit distance between sorted permu- 
tations and Id. This is the distance between a tree and the trivial one which is 
made of n edges and of height 1. 



C X 1 + (9^ - 1)^ - - 1)^^^ - 2(9^ + 1)^ + 1 

Theorem 3.1. The average edit distance between rooted planar trees with n edges 
and n, n — 1, . . . , 2, 1 is n — 1. 

In [6], they determine analytically the average height of a planar tree with 
n edges which is y/nn — | . Thus from [7] , we obtain that the the average edit 
distance is 2(n — y/nn -h ^) = 2n. 




Edit distance between unlabelled ordered trees 



259 



References 

[1] M. Bousquet- Melon. Sorted and/or sortable permutations. Disc. Math., 225:25-50, 

2000. 

[2] J. West. Permutations and restricted subsequences and Stack-sortable permutations. 
PhD thesis, M.I.T., 1990. 

[31 D.E. Knuth. The Art of Computer Programming : Fundamental Algorithms, page 533. 
Addison- Wesley, 1973. 

[4] P. Bose, J.F. Buss, and A. Lubiw. Pattern matching for permutations. Inf. Proc. 
Letters, 65:277-283, 1998. 

[5] K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between 
trees and related problems. SIAM J. Comput., 18(6): 1245-1262, Dec. 1989. 

[6] N.G. De Bruijn, D.E. Knuth, and S.O. Rice. Graph theory and Computation, chapter 
The average height of planted plane trees. Academic Press, 1972. 

[7] E. Roblet and X.G. Viennot. Theorie combinatoire des t- fractions et approximants 
de Pade en deux points. Disc. Math., 153:271-288, 1996. 

Anne Micheli and Dominique Rossin 

CNRS, LIAFA, Universite Paris 7, 

2 Place Jussieu, 

75251 PARIS Cedex 05, 

FRANCE 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



On Parameters in Monotonically Labelled Trees 

Katherine Morris 



ABSTRACT: Let T he a rooted tree structure with n nodes . . . , A 
function f : into {1 < ••• < k} is monotone if whenever L is a 

descendant of tj then f(ti) > Two grand averages, the size of the ancestor 

tree and the Steiner distance, are determined for some tree structures. 



Binary Trees. We consider binary trees whose nodes are monotonically la- 
belled with as described in [4], with generating functions yi{z) = 

1 -h zy‘f{z), • • • ,yk{z) = yk-i{z) -|- zy\{z). The asymptotic behaviour of the co- 
efficients of yk was analyzed in [4]: by determining the singularities qk of yk 
nearest to the origin, given by qi^i = qi{^ — qi) with qi = we have yk{z) = 
+ Ck = 2 (r 2 -. = Apk^/W^ and rfe = l~Mk- 

Moreover, from [2], we know that yi{qk) = 0 <i < k. 

The first parameter we analyze is the size of the ancestor tree. Consider a 
tree, T, and select p random nodes in it. The ancestor tree is the subtree of T 
which is spanned by the root and the p chosen nodes. 



Theorem 3. The size of the ancestor tree in binary trees satisfies 
B{z,u,v) = zv{l + u)B‘^{z,u,v) — zvT‘^{z) -h T{z), where T{z) = ^ . 

Theorem 4. The generating function for the size of the ancestor tree in the mono- 
tonically labelled binary trees is 

Bk{z,u,v) = Bk-i{z,u,v) + z(l + u)vBl{z,u,v) + (1 - v)zyl{z). 

Firstly, the expectation for the size of the ancestor tree is computed. We 
differentiate Bk{z,u,v) with respect to v, let = 1 and use the substitutions 

^ Pk(z,u), Bk{z,u, 1) = yk{z{l + u)), yk(z(l + u)) = %{z). We 
obtain ^k{z, u) = ^k-i{z, u) + yk{z) - %-i{z) + 2^;(1 + u)yk{z)(3k{z, u) - {yk{z) - 
yk-i{z)), with initial conditions yo{z) = 0 and (3o{z,u) = 0. The solution to this 

k r k -1—1 

recursion is (ik{z,u) =5] 0(1 “ 2z(l + u)yi{z)) [yj{z) - yj-i{z) - yj{z) + 

j = l ^i=j -I 

2/j-iW)- 

Next, we substitute yk{z), yk-i{z), yk{^) and yk~i{z) in the fc-th term of (3k 
with their known expansions for z qk. This gives the main term in the re- 



cursion f3k 



Vk-Vk 



l-2z{l-\-u)yk 



ized with Pkq'k^nr‘^1'^ 
ancestor tree: En 



Vqk-z-y/qk-z(l-\-u) 

2qky/qk-z(l-\-u) 



, and it follows that [uP]f3k 



and [z'^u^]Pk 



2p\ 



f 2p\ 

22p+ig^+i \ pj r(p) 

which leads to the expectation for the size of the 



. The latter is normal- 



(fc) 



r 2^P+^pkqk\p 







262 



Katherine Morris 



An analogous method (differentiate Bk{z,u,v) twice with respect to v, then 
let x; = 1) produces the variance for the size of the ancestor tree for monotoni- 

cally labelled binary trees: CpT) ^ ® (v^)> 

(iiife - Cpf) ^ 0, as p ^ 00. 

The second parameter we analyze is the Steiner distance which is the size of 
the subtree spanned by p randomly chosen nodes in a tree. 



Theorem 5. The generating function for the Steiner distance in binary trees is 

_ zv{1-\-u)B'^{z,u,v)—2zvT{z)B{z,u,v)+zT‘^{z){v—2)-\-T{z) 

- l-2zT{z) 



Theorem 6. The Steiner distance in monotonically labelled binary trees has the gen- 
erating function Sk{z,u,v) = 



The expectation and variance for the Steiner distance in our binary tree model 
are E-, 

0(Vn). 

t-ary Trees. We consider monotonically labelled t-ary trees with generating 
functions yi{z) = 1 + zy\{z) yk{z) = yk-i{z) + zyl{z). 



(fc) 

n,p 



(P-I)P (2p' 



2^P{2p-l)pkqk V 



and ^ 



(p-i) 



(p-l)^p^ 

24p(2p-l)2p2g2 



(?)') 



n + 



Theorem 7. The size of the ancestor tree in the monotonically labelled t-ary trees 
satisfies Gk{z^u,v) — Gk-i{z,u^v) + zv{l + u)G\.{z,u,v) + (1 - v)zyl(z). 



(k) 






The size of the ancestor tree has expectation Env ' ^ i / ,r , , » i 

Ordered Trees. The generating functions for monotonically labelled ordered 
trees are yi{z) = , . . . , yk{z) = yk-i{z) + 

Theorem 8. The size of the ancestor tree in monotonically labelled ordered trees 
satisfies Pk{z,u,v) = Pk-i{z,u,v) + 



The expectation for the size of the ancestor tree is E\ 



1/2 
Qk. P 



(k) 

'^iP 2“^P^^pk V p 






Theorem 9. The generating function for the Steiner distance in the ordered tree 
models 



Sk{z,u,v) 



1 



{l-yk{z)Y 
X ( Pk{z,u,v) ( 1 



ZV 



{l-yk{z)y 



z{l-v)yk{z)\ 

{i-ykiz)rJ- 



(k) 

The formula for the expectation is En,p ~ 



gy^p(p-i) 

22p(2p-l)pfc 




References 

[1] P. Flajolet and A. Odlyzko. Singularity analysis of generating functions. SIAM Jour- 
nal of Discrete Mathematics, 3:216-240, 1990. 

[2] P. Kirschenhofer. On the average shape of monotonically labelled tree structures. 
Discrete Applied Mathematics, 7:161-181, 1984. 

[3] A. Panholzer and H. Prodinger. Spanning tree size in random binary search trees. 
Annals of Applied Probability, accepted. 




Parameters in Labelled Trees 



263 



[4] H. Prodinger and F. J. Urbanek On monotone functions of tree structures. Discrete 
Applied Mathematics, 5:223-239, 1983. 

Katherine Morris 

University of the Witwatersrand, Johannesburg, South Africa 
kate@maths.wits.ac.za 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Number of Vertices of a Given Outdegree in a 
Galton- Watson Forest 

Tatiana Myllari 



ABSTRACT: Galton-Watson forests consisting of N trees and n non-root 
vertices are considered. The limit distributions of the number of vertices of a given 
outdegree in such a forest are obtained. 



Let us consider the Galton-Watson forest consisting of N trees and n 
non- root vertices. This forest can be viewed as a set of all realizations of Galton- 
Watson process G with N initial particles conditioned to have the total progeny 
n. Let us assume that the process has the offspring distribution P{^ = k} = 
Pfc, fc = 1, 2, . . . , and generating function F{z) = introduce a 

random variable with the distribution = fe} = P{^ = ^ I ^ ^ ^nd 

let fc = 1, 2 , . . . , be distributed as and % fc = 1, 2 , . . . , be distributed as 

Let us define + . • . + and Sn^ = + . . . + Using results of 

Kolchin [1] we can obtain the following theorem about the distribution of number 
of vertices with a given outdegree: 



Theorem 10- Let Ar{N, n) be the number of vertices with outdegree r in the Galton- 
Watson forest with N trees and n non-root vertices. Then 



P{Ar{N,n) = k} = 




= n-kr} 
P{'S'iV+n = '^} 



This theorem shows that in order to obtain the limit distribution of Ar{N, n) 
it suffices to find the limit distribution of the sums of auxiliary independent ran- 
dom variables and to use the normal approximation of the binomial distribution. 
One can find a proof of Theorem 1 in [2] for r = 0 and in [3] for r > 0. Using 
Theorem 1 we prove that the limit distribution of Ar{N,n) is normal with param- 
eters depending on how N and n approach to infinity. We consider three different 
cases (see theorems 2-4 below; proofs could be found in [2] and [3]). To formulate 
these theorems we need some notations and assumptions. Assume that the equa- 
tion zF\z) = F{z) has a solution c > 0 satisfying F(c) < oo and F"(c) < oo. 
Then we can assume, without loss of generality, that = 1. Let the variance of 
^ exists and equals B. We assume that there exist at least three nonzero probabil- 
ities in the offspring distribution including pq. Let j* := inf{fc > 0 : p/c > 0} and 
I* := inf{/c > 0 : > 0}- Denote by d and dr the span of distributions of ^ 

and respectively. Let w* be the smallest nonnegative integer such that j* Fw* 
determines the span d of the distribution of and v* be the smallest nonnegative 
integer such that j* Fv* determines the span dr of the distribution of 

Without loss of generality we may assume that the offspring distribution is 
given by P{^ = fc} = A^p/e/F(A), where A is positive number within the circle 




266 



Tatiana Myllari 



of convergence of F{z). Let m = m(A) and be, respectively, the 

expectation and the variance of Introduce 



=Pr(A)(l -Pr(A) - 



(m-r)V(A) 
^2 > 



( 1 ) 



Theorem 11.. Let N,n oo in such a way that n takes values divisible by dr, 
n/N'^ ^0 . Let 

Amax(3'*,-;)n^0O if V = f, 

^max«, ^00 if r = j* + w*, w* > 0, 
oo otherwise, 



where A is determined by 



Then 



AF'(A) 

P{Kr{N,n) = kdr/d) = 



n 



N -{-n 

dr(^ + ^(1)) 






da^ y/27r{N + n) 

uniformly in the integers k such that u = {kdr/d — (iV + ^)Pr(A))/<J,^ViV + n lies 
in any finite fixed interval 



Theorem 12.. Let N,n ^ oo in such a way that n takes values divisible by dr, 
n/N^ — ^ 7 for some 7 > 0. Then 

P{Ar{N,n) = kdr/d) = — +^)) g-«V2 

dcF^y/2'Kn 

uniformly in the integers k such that u = {kdr/d— {N -{-n)pr + apr\/N + n)/a^^ 
lies in any finite fixed interval, where a = {r — 1)/By/^ and is given as in (1) 
with A = 1 . 



Theorem 13.. Let N,n ^ 00 in such a way that n takes values divisible by dr, 
n/N‘^ 00 . Then 



V{Kr{N,n) = kdr/d) = 



dr{l + 0 ( 1 )) ^-u'^/2 

da^^/rim 



uniformly in the integers k such that u = (kdr/d — (AT + n)pr)/G^^Jn lies in any 
finite fixed interval, where is given as in (1) with A = 1. 



References 

[ 1 ] Kolchin, V.F. - Random mappings. Springer, New York, 1986. 

[2] Myllari, T. - Limit distributions for the number of leaves in a random forest. Adv. 
in Applied Prob., vol. 34(4), 2002. 

[3] Myllari, T., Pavlov Yu. - Limit distributions of the number of vertices of a given 
outdegree in a random forest. To appear in Journal of Mathematical Sciences. 

Tatiana Myllari 

Abo Akademi University, Abo, Finland 
tatiana.myllari@abo.fi 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Destruction of Recursive Trees 

Alois Panholzer 



ABSTRACT: We study, for the family of recursive trees, two procedures that 
destroy trees by successively removing edges. In both variants, one starts with a 
tree T of size n and chooses one of the n — 1 edges at random. Removing this edge 
costs a toll depending on the size of T, given by the toll function tn and leads to 
two subtrees V and T". In the one-sided variant, the edge-removal procedure will 
be iterated with the subtree containing the root, whereas in the two-sided variant 
it will be iterated with both subtrees. For both variants, we study for toll functions 
tn — with a > 0 the total costs (= sum of the tolls of every step) obtained by 
completely destroying random recursive trees, where we compute for this quantity 
the asymptotic behaviour of all moments. 



1. Introduction 

In this paper, we are considering two recursive edge-removal procedures PI (“one- 
sided destructions”) and P2 (“two-sided destructions”) to destroy (rooted) trees. 
Both variants start with a tree T of size |T| = n, where the size measures as usual 
the number of nodes of T. If n = 1 there are no edges that can be removed and 
both procedures PI and P2 stop, but we assume that this costs the toll ti. If 
n > 2, then one of the n—1 edges in the tree will be chosen and afterwards this 
edge will be removed from T. We assume now that removing this edge costs a 
certain toll depending on the size of T and which is given by the toll function t^. 
After removing this edge, the original tree T falls into two subtrees T' and T" with 
sizes 1 < |T'|, |T"| < n — 1, where one of them (let us assume T') contains the root 
of T. In the two-sided variant P2, the edge-removal procedure will now be applied 
recursively to both subtrees T' and T", whereas in the one-sided variant PI, the 
edge-removal procedure (in [7] called “cutting-down” ) will only be applied to the 
subtree T' that contains the root. 

Thus the procedure P2 terminates and T has been destroyed by P2, when 
all n — 1 edges are removed from T, whereas PI terminates and thus T has been 
destroyed by PI, when the root of T has been isolated. We are now interested in 
the total costs (= sum of the costs of every edge-removal step) Ci(T) resp. C 2 {T) 
that occur when destroying a tree T by PI resp. P2. Of course, these quantities 
are for |T| > 2 given recursively by 

Ci(T) = Ci(T') + resp. C 2 (T) = C 2 (T') + C 2 (T") + V,, (1) 

where T', T" are the subtrees appearing after the first edge-removal step and T' 
contains the root of T. If |T| = 1 then Ci{T) = C 2 (T) = ^i. An example for 
destroying a tree by PI resp. P2 is given in Figure 1. 

In this paper, we study for toll functions tn = with a > 0 the random 
variables Xn (resp. Yn), which measure the total costs that accumulate when de- 
stroying a random recursive tree of size n by the random edge-removal procedures 




268 



Alois Panholzer 




O O ^ € 



Figure 1. Destruction of a tree T of size 7 by the procedures 
PI and P2. Using the toll function = n for n > 1, the one- 
sided destruction has total costs C\{T) = 17 and the two-sided 
destruction has total costs C 2 {T) = 28. Here, e denotes the empty 
tree. 



PI (resp. P2). The tree family considered and the probability model used are 
described next. 

The family of recursive trees can be defined in the following way. A rooted 
labelled tree T of size n with labels 1, 2, . . . , n is a recursive tree, if the root is 
labelled with 1, and for each node v holds that the labels of the vertices on the 
unique path from the root to v form an increasing sequence. It is seen easily that 
there are = (n — 1)! different size-n recursive trees. As the model of randomness 
we use the random tree model, which means that every recursive tree of size n can 
be chosen as input for the edge-removal procedures with equal probability • 
We speak then about random recursive trees or uniform recursive trees. For a 
survey of applications and results on random recursive trees see [5]. 

Further, we will always assume for PI resp. P2 that the removed edges are 
at each stage chosen at random from the remaining tree. We speak thus about the 
random edge-removal procedures. 

If one chooses the toll function = 1 forn > 2 with = 0, then Xn measures 
exactly the number of edges that are removed by the cutting-down procedure PI 
to destroy a random size-n tree. This quantity was studied by Meir and Moon in 
[7], where they obtained the following results for the first two moments: E(X^) ~ 

and E(X^) ~ Choosing the toll function tn = 1, the (corresponding) 

quantity was also studied for other tree families, see e. g. [6, 8]. For unrooted 
labelled trees, the (corresponding) quantity was studied for a few toll functions 
tn in [4] and for general toll functions tn = rf in [1]. 



2. Results and mathematical preliminaries 

We analyzed the behaviour of the moments E(A^) resp. E(l^) of the total costs 
when destroying size-n recursive trees with procedures PI resp. P2 for toll func- 
tions tn = with q: > 0 resp. a; > 0 (a = 0 for is trivial, since then Yn = 2n— 1) 
and have obtained the following results. Here Hn := J2k=i i denotes as usual the 
n-th harmonic number and ^(x) := ^ logF(x) denotes the Psi-function. 




Destruction of recursive trees 



269 



Theorem 2.1. The s-th moments E(X^) resp. s-th centered moments — 

E(Xt^)]^) of the total costs incurred by one-sided destructions of random re- 
cursive trees for toll functions t^ — n^ with o > 0 are, for s an integer and 
n — > oo, asymptotically given by 



_ 1 + s + (q; + 1) J2i=i H- 1)) 



(a + l)« log'n ^ 



(a + l) 



s + l 






( 2 ) 



^og®"^ n 

E ([ X „- E ( X „ r ) 



(-ir~ 

(a+l)i 



5 — 1 






i=0 



(a + l)« 



log®'*'^ n 






( 3 ) 



Theorem 2.2. The s-th moments E(y^) resp. s-th centered moments E([l^ — 
E(l^)]^) of the total costs incurred by two-sided destructions of random re- 
cursive trees for toll functions tn = n^ with a > 0 are, for s an integer and 
n ^ oo, asymptotically given by 



1 

E([r„-E(y.)|-) = ^,^ + o(^ 







log ^ n/ 


( 4 ) 


),s>2, 


( 5 ) 



where the appearing constants js and Sg are given by 



a+l 



+ E’^(K« + i)) + E 



1=1 



7s = 






L 1=0 


^ (a+l)» 



1=1 



l{a+l) 



s I- 

3 T + E E 

1=1 j=l 



} /ix r(j(a+i)+i)r(o-j)(g+i)-i) 
' r(z(a+l) + l) 



(a + 1)^ 



(-ir-^(.-l)!(g + ir 
riLi(K<^ + 1) ~ 1) 



7^1 r(/(a + i) + i) 

Prom these theorems follow that for the toll tn = n^ the r. v, Xn 

resp, converge in probability to 1 with convergence of all moments. 

Furthermore, if Wn (resp. Wn) denotes a zero-mean and unit-variance normaliza- 
tion of Xn (resp. Yn)^ then Wn (resp. Wn) has s-th moments growing like log 2 “^ n 
for s > 2. This shows that if Wn (resp. Wn) has a limiting distribution, this cannot 
be established by the method of moments. It remains still open whether there ex- 
ists a limiting distribution for some centered and scaled version of Xn (resp. y^i)- 
Also it seems surprising, that we do not have a lead-order discrepancy between 
Xn and Yn, although it holds of course Yn> Xn- 

To prove Theorem 2.1 and Theorem 2.2, we use a recursive approach, as it 
was done for one-sided destructions and the special toll function = 1 in [7]. 
That this recursive approach is indeed permitted is stated in Lemma 3.1. The 




270 



Alois Panholzer 



appearing distribution recurrences (6) lead to recurrences (8) and (12) for the s-th 
moments. Using generating functions, we obtain differential equations for every 
s-th moment that can be solved, where the solutions (11) and (15) are composed of 
the operations differentiation, integration and the Hadamard product of generating 
functions of the lower moments and the toll function. The Hadamard product 
F{z)(DG{z) of generating functions F{z) = J2n>ofn^"' = Jln>o9nZ'^ 

is defined by F(z) © •— Sn>0 fnQnZ'^- Moreover we use in this paper the 

abbreviation F^\z) := F{z) © ■ • • © F{z) = 

s times 

To extract the asymptotic information from the solutions (11) and (15) we 
cannot use the extension of the “singularity- analysis-toolbox” given in [1], since 
the theorems shown therein deal only with positive integral powers of logarith- 
mic terms whereas here occur negative powers. Thus we will go back to a quite 
elementary, but here efficient approach that computes the asymptotic growth di- 
rectly at the level of the coefficients. This has the advantage that the effect of 
the operations integration, differentiation and Hadamard product to the growth 
of the coefficients can be described easily, whereas of course difficulties may arise 
when computing the growth of the coefficients of the Cauchy product. But for the 
problem considered here, the two summation formulae (16) and (17) are sufficient. 
Using them, the computations for one-sided destructions are done in Section 5 and 
(only sketched) for two-sided destructions in Section 6, 



3. Recurrences for the quantities considered 

The basic idea in our approach is to study the distribution recurrences 

Xn = Xk^ + ^ > 2, Xi = U; yn — yK^ + ^n-Kr^ + ^ > 2, = ti, (6) 

where Yn and are identically and independently distributed random variables 

and Kj^ is independent of Xn-> yn and Yn- The random variable Kn will be given 
by the splitting probabilities ^{Xn = A:}, for 1 < /c < n — 1, where pn,k is 

the probability that after removing a random edge from a size-n random recursive 
tree, the subtree containing the root has size k. 

To reduce this problem to a study of (6), it is of course necessary that ran- 
domness is preserved by cutting-off a random edge. This means that after removing 
a randomly selected edge from a size-n random recursive tree, the remaining sub- 
trees with sizes k resp. n — k are after natural-order-preserving relabellings of the 
nodes with labels {1, . , . , fc} resp. {1, . . . , n - fc} again random recursive trees of 
sizes k resp. n — k. This property of random recursive trees was shown implicitly in 
[7] when computing the splitting probabilities Pn^k^ Since this is a crucial point in 
our approach, we will restate their proof. Thus randomness is actually preserved 
by cutting-off a random edge and the recurrences (6) follow directly from (1). 

Le mm a 3.1 (Meir and Moon, 1974). Let us assume that we choose a random recur- 
sive tree T of size n and also one of its n—1 edges at random, and after removing 
this edge, the remaining subtrees T' resp. T" are of sizes k resp. n — k, where we 
further assume that T' contains the root of T. Then it holds that, after an order- 
preserving relabelling of the nodes, both subtrees are random recursive trees of sizes 
k resp. n — k and the splitting probabilities Pn,k given by 

7 777 777 : 77 , for 1 < k < n - 1. 

(n — l)(n - k)(n - k -h 1) 



Pn^k — 



(7) 




Destruction of recursive trees 



271 



Proof, Starting with a size-n recursive tree and removing one of its n — 1 
edges, we obtain a subtree T' of size 1 < fc < n - 1 which contains the root of 
T and another subtree T" of size n — k. After the order-preserving relabeliings, 
we can consider both subtrees as recursive trees. Now we want to count, how 
often we can obtain a particular pair (T', T") of recursive trees with sizes k resp. 
n — k^ when removing one edge of recursive trees of size n. It will turn out that 
this quantity w{T',T") depends only on the sizes k and n, not on the particular 
chosen trees T' and T" and the lemma will be proven. Equivalently we can go 
the other way around and ask, in how many ways w{T'^T") can we reconstruct 
size-n recursive trees from the pair (T', T"). Let us assume that the removed edge 
originally connected the root of T" with the node with label j in T'. Then all n — fc 
nodes in T" must have labels larger than j. We have thus {^ZD possibilities to 
select them from {j+1, . . . , n} and distribute them order preserving to T", whereas 
the remaining k-j labels from {j+1, . . . , n} are distributed order preserving to the 
nodes of T' with labels larger than j. This gives in that case {^ZD different size-n 
recursive trees. By summing up we find that independently from the pair T' and 
T", we always have w{TZT") = (n-D ~ ik-i) therefore randomness is 

preserved. 

Due to the given bijection between pairs of recursive trees with sizes k and 
n-k and the pairs consisting of a size-n recursive tree and one of its n — 1 edges, 
we obtain the equations = {n-l)Tn andp„,fc (fc"i) (n-i)T^ • 

Thus (7) is also shown. 

□ 

Prom the distribution recurrence (6), we obtain then the following recurrences 
for the s-th moments /in^ := E(X^) of 



Sl-\-S2=S 






Si-\-S2=S 



Si 



n—1 



k=l 



We write them as 






n—1 



= ^ ^ ^1’ 



( 8 ) 



k=l 



with rlfl ^ 

S2<S 

Introducing the generating functions resp. the common abbreviation 

n n 1 — z 



T ,>1 



1>1 



n>2 



we obtain from (8) by multiplying with and summing up: 



n>2 



n — 1 



EE 



1 



^ _ k){n - k 






(9) 



S1+52=S, 

S2<S 




272 



Alois Panholzer 



This gives thus the differential equation 

where is given by (9). Solving and adapting to the initial condition 

= JjLi^ = we obtain the solution 






Analogously we obtain from the distribution recurrence (6) the following 
recurrences for the s-th moments An^ := E(y^) of Y^: 



X^^=mYK„+Y„^K„+tnY)= Y 



Sl+S2 + S3=S 



^1? ^2, 53 






We write them as 



A'*' = + "n’> ^ ^ 2, aW ^ t\, (12) 



with rW = LJ,,sX^ EZlPn,kX^rK!},- 

S2,S3<S 

With the generating function — , we obtain from (12) by 

n>l 

multiplying with and summing up: 

^aw:^^" = z£aW(^)-aW(z), 

n>2 

EE in-k){Lk+l/ ^^^ = 



’W= E 



Si+S2+S3=s, 

S2,S3<s 






This leads to the following equation with given by (13): 

(1-z)L(z)^AW(;s)-AW(x)--^ r Al*'(t)dt = rW(^). (14) 

dz I - z 

By introducing the functions Al^l(z) := = Xln>i this equa- 

tion can be transformed to the first order differential equation 




Destruction of recursive trees 



273 



which has by adapting to the initial condition A1^1(0) = = tf the solution 

= (1^ L + {ihp- 



4. Summation lemmata 

Essential to our approach are the following auxiliary formulae to control the as- 
ymptotic growth of the appearing convolutions. 

Lemma 4.1. For eR with a, /? > — 1 and p,q>0 the following expansions 

hold: 



Inoc^ 



{n - k)^ Y{a + 1)T{P + 1) 



^ log^(fc) log^(n - k) log^+^ n r(a + /3 + 2) 

^ A q^{f3 + l)+p^{a + l)-{p + q)^{a-\-^ + 2) ^ 1 \\ 

\ logn Vlog^n// 

^ {n-k)°‘ n“ / \ / hg^n \ 

k{k — 1) log^(n - k) log^ n \log^~^ n^ \ n )' 



V loer^ ^ n / \ n ) 



Proof. To obtain (16), we start with the following expansion: 
\og^ (k) log^ (n — k) \og^^^ n loe-^No/. loeid-^)^^ 



]ncr^ ( k^ ]cter^ ( n. — k^ ^ 



fc=LlognJ (1+1^) (1+ logn j 

k>^{n - fc)“ 



^ log«(fc) logP(n - k) nj log*’(n - fc) 

+^+l n-^nj gl°g«+PMl-^) 

logP+^n ^ nVn/ V nJ L logn 

fc=[log nJ 



g^^^n ^ nVni V n/ L logn \Tog^j 

^ fc^LlognJ ^ 

0(- ^'°" ) + 0( ’°"3n"^ )] +0(n“(logn)^+i-P) +OK(logn)- 



The appearing sums can be considered as Riemann sums for the Beta integral 
resp. their derivatives. As examples, we give the following two computations: 

1,7-. /q. pi ./I \0 . .n \a . 



fc=LlognJ 

^ r(g+l)r(^+l) ,-/' (logn)^ \ / (logw)° \ 

r(a + /3 + 2 ) V n'^+i ) ^ \ n“+i /’ 

k=[\ognj 

I ,c.( i^osri)°‘ \ _ 9 / r(g + l)r(/? + l) \ -/ (logn)^+S / (logn)“ \ 

n“+i ) dl3\ r(g + /3 + 2) )^ \ n^+i 7 V n“+i )’ 

where we obtain ^ + 1) - 'J'Ca + /? + 2)). 

Analogously one can treat the remaining sums which leads eventually to (16). 



n-[lognJ - , ^ , 



fc^L^og 




274 



Alois Panholzer 



It remains to prove (17). We start with dissecting the summation interval 



n-2 

E 

k=2 



Llogn-I ^ n — 2 

_ ^ jn-k) I 

fc(fe- l)lOgP(n-fe) J. V J I ^ fc(fc-l)log^(n-fc) 



(n - fc)° 



(n— fc)^ 



I — ^ - 1 

*- log n -■ 



_ ~ n)° 






log^n> 



where the remainder bounds are coming from the estimate = ^( logPn )"^ 

0(1), which combines bounds for a > 0 and a < 0 in the considered range < 

k < n — 2. The remaining sum can be evaluated asymptotically which gives the 
main term in (17): 



I ^ I 

L log n -I 



I I 

L log n -I 

^ in I ^ k(k — 1) 

k=2 k{k-l){l+—^^^) k=2 ^ ^ 



I ^ I 

^ loe n -I 



= E 



k=2 



k{k - 1) 



i)^P A: k(k-l){ '^n^J 

(1^)_ 



+ 0 



. L log n -I 1 y 
I ^k=2 fc-1 \ 
\ n ) 



1 + 0 



□ 



5. One-sided destructions 



Now we want to prove the results of Theorem 2.1 concerning the asymptotic be- 
haviour of the s-th moments of the total costs Xn when destroying random recur- 
sive trees with toll functions for n > 2 with a > 0 and ti = 0. Choosing 

ti = 0 instead of = 1 has of course no influence to the stated asymptotic be- 
haviour {X* = Xn + 1, if X* measures the total costs with toll function tn = n^, 
for n > 1), but (11) is then slightly simpler. 

In order to reduce extracting coefflcients from (11) to an application of for- 
mulae (16) and (17) and to avoid dealing with convolutions of functions growing as 

72"^, we introduce the generating functions := 

and differentiate (11): 






, 1 r rMft) 



(18) 



where A^\z) is given by (9) with there appearing functions t{z) = 

Now we want to show the expansion (2) by induction, where we additionally 
use, for /3,g > 0, the asymptotic growth of the coefficients (see [2]) 



TV 



(3-1 






1 



' (1 - z)0 (L{z)y r(/?)log®n V'^ ' logn ' ^''log^n^>^’ 

We further use the trivial effect to the growth of the coefficients when differenti- 
ating and integrating generating functions F{z) =J2n>2 with /n = 



log^ n ■ 







Destruction of recursive trees 



275 



Since = X]n>i 7T ~ = t{z) © - 1 — L{z)] , we obtain 

[ 2 "]<(^) = n“, l-L(z)Wl--, [z"]rW(z) = n“-n“-\ forn > 2. 

Now we use (16) and (19) to obtain 

' (1 - 'z)t(z) + °(i^)) ((” “ + 0 (("-'=)“-")) + 0(n“) 



n-2 

E 



fc=2 



(n — fc)° 
log /c 



+ ®(‘) E + ° ( £ ) + « ( £ +»("“) 



1 ^ ^(a + 2) n' 



a+l 



fc=2 

a+1 



a + 1 log n a -h 1 log^ 






( 21 ) 



and 






'{l-z)LHz) 



= E + “(ijr^)) ((» - ‘)“ + o((” - ‘)-)) + o("“) 



E 



(n - k)° 

1 n^*+^ 

a + 1 log^ n 



fe=2 
n-2 



+o(E^^)+«(Ei^)+«("*) 



(n — A:)^ 



fe=2 

a+1 



log k 



log k 






We proceed with [z”] another 

m formula (16): 



a+1 log^ n 

application of the convolution formula (16): 

(n-fc)“ 



1 r 

1 - 2 Jt=0 



dt = ^ V ~ + O/" (»-fc)° "i 

a + 1 E ^ E iog3(„-fc) j 



=0 (1 - “ + 1 ^ log^(»i - A:) fc=2 



+ o(i^) 



1 

+ 0 



+ l)^log^ 



n 



V log^ n / 



( 22 ) 



Adding (21) and (22), we obtain finally 



mL'Ii = = 



1 n“+i 2 + (a + l)^(a + 1) n“+i 



a+l 



+ 



+ 0 



Moer^ n/ 



a + 1 logn ' (a + 1)^ log^ n ' Vlog*" n> 

Substituting n — 1 for n, it follows that expansion (2) is valid for s = 1. 

Now we assume that for a given s > 2, all moments E(X^) = /zIT' with 
1 < r < s have the expansion (2). We want to compute then the asymptotic 
growth of the coefficients [z'^]r^^^{z). To do this, we use [z'^]t^^^ (z) = and the 
expansion given below which follows from (17) and (20): 

[znif/^^\z)){z-{l-z)L{z)) = 

• 1 + 0(— ), for S 2 = 0, 

\ /n ' 



n-2 ^ 

E yjf — 1) [(a 



1 (n + l-fc)*='“+^’ Hs^ + S2 + (g + 1) E?=1 + 1)) 



+ 



fc(fc — 1) L(a + 1)®2 log®^ (n + 1 — fc) (a+l)® 2 +i 



log®2+i(^ 1 _ /j.) Vlog^2"^^(n + 1 — A;) 




276 



Alois Panholzer 



1 n*2(«+i) +s2 + (a + l)Ei=i^(Ka + l)) 

(a + l)®2 log^^n (a + l)«2+i log*2+^n 



, ^S2(a + 1) 

<l0g^2+2 ^ 



^ , for 1 < 52 < 5 . 



Thus for Si + S 2 = 5 and the restriction 0 < S 2 < s, we can describe the 
growth of the summands of [z'^]A^\z): 



I' S _ (1 _ 



^sa_j_0(^sQi for 52 = 0, 

/ 5 \ 1 / g \ iJ,, + 52 + (g + 1) Eth + 1)) 

I 5i 1 (a + 1)^2 log^2 ^ I l)® 2 +i n 



Tog®2+2 



for 1 < 52 < s. 



Therefore the maximum is obtained when choosing S 2 = s — 1 and the other 
contributions are asymptotically negligible. This leads to the required expansion 

r 1 S ^s(a+l) — 1 



(a + l)*-i log""^n 

s[Hs-i + 5 - 1 + (q + 1) E?=i + 1))] 



^s(a+l)-l > 

log^“^^ n ' 



{a + ly log^ n V log^“^^ n / * 

Then by applying (16) and (19), we obtain after some easy manipulations 
the following expansion: 



2-llogfc^ 



'(l-«)L(z) ^Vlogfc log^fc Mog^fcV 

r s (n - s[Hs-i + s - 1 + (a + 1) EtZi + 1))] 

l(a + l)®“^ log®“^(n — A:) (a + 1)® 



\og^ (n — k) V log®“^^(n — /c) /J V log^“^n / 

1 if, + 5 - ^ + (g + 1) ^0(o: + 1)) 

(g + 1)^ log^ n (g H- 1)^+^ log^^^ r 






/ n^("+i)-i x 
V log^-^ n / 



log^^^ n Vlog^“*“^n/ 



Analogously we obtain 



^ kl-zmz) ~ (a + l)Mog*+'n^ Vlog*+2„ 



We proceed with [ 2 "] (T-t)i%) ^^ = (a+i)^ +Q( "iog° +^'»' ) 

again by using the convolution formula the expansion 



I- z {l-t)L‘^{t) ^L(a+1)« log^+^(n-/c) 



log®^^(n — k) 



s(a+l)-l , 



log®^^ n / 5(g + 1)^+^ log^”^^ n vlog^"**^ n> 




Destruction of recursive trees 



277 



Adding (23) and (24), we obtain 






1 n®(«+i) 



' (a + l)" log*n 

Hs + s + {a + l) ^{l{a + 1 )) 



o5(q;+1) 



{a + iy 



log^“^^ n Mog^“^^n 



Substituting n - 1 for n, it follows that expansion ( 2 ) is valid also for 5 . 

To show the formula (3) for the centered moments, we plug in the asymptotic 
expansion ( 2 ) for the ordinary moments and get 



E{[Xn-E{Xr.)Y)=Y,( {-iy-^E{X^^){E{Xr^)Y 






(a + 1)^ log 



+ (E i)l+i" + ^2 + (« + l)^(a + 1)) - fc(l + (« + mio‘ + 1)) 



+ (a + 1) ^ ^(/(a + 1))] [ + 0 • 



Since it holds for s > 1 (see e. g. [3, p. 187flP]): 



k=0 ^ ' 

E (fc) E + 1)) = E (" 7 (-1)^-'"'® (a + 1)(« + 1)), 

k=o ^ / l=l 1=0 ^ ^ 

and for s > 2: ^Lo "" ELo = 0, we obtain (3). 

For the special case a = 0, where Xn counts the number of removed edges 
until the root is isolated, we obtain due to further simplifications 






E(|x.-E(xjn = ^j^+o(^ 



, for s > 2 . 



Here 7 denotes as usual the Euler-constant. 

We denote also the case a = 1, where removing an edge of a tree T costs 
exactly the size |T| of T: 



1 iff3 + (2s + l)i?2s-(l + 27 )s 



2 ^ log^ n 



\loR^+^ nJ’ 



log^"^ n Vlog^”^ n 



for s > 1, 



E([X„ - E(X„)]^) = (1 + ^) + 0 (i^) 



for 5 > 2. 



We want to remark further that with the approach used here, one can also deal 
with logarithmic toll functions tn = log^ n and compute the asymptotic behaviour 
of the moments E(X^), where again one cannot deduce a limiting distribution. 




278 



Alois Panholzer 



6. Two-sided destructions 



Here we sketch the proof of Theorem 2.2 concerning the asymptotic behaviour of 
the s-th moments of the total costs when destroying random recursive trees 
with toll functions tn = n^,ior n > 2 with a > 0 and ti = 0. Again choosing t\ =0 
instead of = 1 has no influence to the stated asymptotic behaviour {Y* = Yn+n, 
if Y* measures the total costs with toll function for n > 1). 

To reduce extracting coefflcients from (15) to an application of the convolu- 
tion formula (16), we write equation (15) as 



aW(z) 









f r^^\t)dt 

Jt=o 



{l-z)L{z) il-zyL^z)J,^, 
2 

(T^ 



(25) 



where is given by (13) with there appearing functions t{z) = Xln >2 

To start the proof of the expansion (4) by induction, we consider AI^]( 2 :) = 
Yln>i ^ ~ obtain as in Section 5: A^^{z) = t{z) © — 1 — L{z)] and 

we can use the expansion for [z'^]A^\z) given there. Also ( 21 ) is of interest. 

Thus [z'^] + 0(n^"^) and we obtain with (16) and (19) 



1 



f r^^^{t)dt = 
Jt=o 



n 



a-t-l 






(1 - zYL‘^{z) ' Oi{a -h 1 ) log^ n 

and by other two applications of the convolution formula also 
1 1 



(26) 









fjrilml = (27) 

Jt=o (1 - A=o ^log n7 



Adding (21), (26) and (27), we obtain finally 



aL'Ii = [2 "1AW(^)=- 






a -h 1 logn ^a(a + 1) (a + 1)^ ^ a -h 1 ^ log 



( — 

\a(a -1- 



- + 



+ 



^(o + 1 ) \ 



a-\-l 



a +1 



V log^ n / 



Substituting n — 1 for n, it follows that expansion (4) is valid for s = 1. 

Now we assume that for a given s > 2 , all moments E(F^) = aIT^ with 
1 < r < s have the expansion (4). To compute the asymptotic growth of the 
coefficients [z^]A^^{z), we use [z'^]t^^^{z) = and the following consequences 
of (16), (17) and (19), where the 7 ^^ are given in Theorem 2.2: 



r(s2(o + 1) + l)r(s3(a + 1) - 1) 



(a + l)-3+33r((s2 + S 3 ){a + 1)) log*^+*3 






for 0 < S2, 53 < s, 



n 



S3(a+1)-1 



(a + 1)^3 (s3{a + 1) - 1) log ®3 n 



(l + 0(j^^)^, for 52 = 0, 0 < 53 < 5, 



n 



,S2(o: + l) 



+ 7s2 



n 



52(0:4-1) 



log 



•S2 + 1 , 



(a + 1)^2 log ®2 n 



1 -f O(-), for 52 == 0, 53 = 0. 



S2 < 5, 53 = 0, 




Destruction of recursive trees 



279 



Thus for si + S2 + S3 = s and the restrictions 0 < S2, S3 < s, we can describe 
the growth of the summands of [z'^]r^^\z): 



\zl\ ]t 

> Si, S2, S3 






^ / s \ r(s2(g + 1) + i)r(33(g + 1) - 1) 






si,S2, Say (a + l)^2+s3r((52 + S3)(a + l)) , 

for 0 < S2,S3 < s, 

S \ 1 jj^soc-\-S3-l I 

s \ 1 



^sc,+S2 

+ 1 17.2; 



sij {a + 1)^2 log®2 n ^si J ^ n 

(l + 0(i)), fors2 = 0, S3 = 0. 



< s, S3 = 0, 



Therefore the maximum is obtained when choosing S3 = 0 and S2 = s — 1, 
but additionally the instances si = 0 and 0 < S2, S3 < s give contributions to the 
second order term. We get thus the expansion 

s(a+l)-l , 






^s(a+l)~l _ ^s(a+l)-l 



(a + 1)* 1 log® ^ n log® n ' V log®'*'^ n 

•,i ~ . /s\ r’(j{“+l) + l)r((s— j)(a + l) — 1) rpr- -1 j- J 1 

With 7s := S7s-i + 2^,=i Lj— ^—7 ^ -■ This gives by repeatedly 

(o:+l)^r ^ s(q:+ 1) ) 

applying (16) and (19) the following expansions: 

r..nl rl°l(z) _ 1 n»<°+^) , ( »(s(g+l)) , 1 , 7., ) , (O Z' illlflll ) 

I- J (l-z)L(z) (a+1)^ log^ n ' ^ (a+l)^ (a + l)^ + l ' s(a+l) y log^ + 1 n ' yiog^+^^y? 



(^)- 



[z^ 



'{l^zYL^z) 
2 



f r^^\t)dt = 

Jt~Q 



1 



n 



s(q;+1) 



(a 4- 1 )^(s(q; + 1) - 1) log^+^ n 

s(a + l) 



s(a+l) . 

^°(log®+2n)’ 



Lo (l~t)Lg^ “ °(log®+^n)- 



^ ‘a-zY 

Combining these results and using (25) we obtain 






1 



oS(« + l) 



■ + 7si 



s(a+l) 



+ 0 



/ \ 

V log^’^^ n / ’ 



(a+l)" log®n ^ '"log®+^n ' “Vlog® 
where 7^ satisfies for s > 1 the following recurrence with inital value 70 := 0: 

1 1 



^ I j : ^ 

a + 1 (a + 1 )^ (a + l)^+^ (a + l)^(s(a + 1 ) - 1 ) 



s — 1 



+ 



i (S. r(i(^ + 1) + i)r((^ - j){a + 1) - 1 ) 
+ r(s(a + l) + l) 



Solving this recurrence leads exactly to the expression for 7s given in Theorem 2.2. 
Analogous computations as in Section 5 leads from (4) to the stated result (5) for 
the centered moments. 



Acknowledgment 

The author thanks the anonymous referee for many valuable comments. 




280 



Alois Panholzer 



References 

[1] J. A. Fill, P. Flajolet and N. Kapur, Singularity Analysis, Hadamard Products and 
Tree Recurrences, submitted, 2003. Available at 

http : //algo . inria . f r/f laj olet/Publications/FiFlKa03 . pdf . 

[2] P. Flajolet and A. Odlyzko, Singularity Analysis of Generating Functions, SIAM 
Journal on Discrete Mathematics 3, 216-240, 1990. 

[3] R. Graham, D. Knuth and 0. Patashnik, Concrete Mathematics (Second Edition)^ 
Addison Wesley, 1994. 

[4] D. Knuth and A. Schonhage, The expected linearity of a simple equivalence algo- 
rithm, Theoretical Computer Science 6, 281-315, 1978. 

[5] H. Mahmoud and R. Smythe, A Survey of Recursive Trees, Theoretical Probability 
and Mathematical Statistics 51, 1-37, 1995. 

[6] A. Meir and J. W. Moon, Cutting down random trees. Journal of the Australian 
Mathematical Society 11, 313-324, 1970. 

[7] A. Meir and J. W. Moon, Cutting down recursive trees, Mathematical Biosciences 
21, 173-181, 1974. 

[8] A. Panholzer, Non-crossing trees revisited: cutting down and spanning subtrees. 
Discrete Mathematics and Theoretical Computer Science, 2003. 
http://dmtcs.loria.fr/proceedings/html/pdfpapers/dmAC0125.pdf. 

Alois Panholzer 

Institut fiir Diskrete Mathematik und Geometrie, Technische Universitat Wien, 
Wiedner Hauptstrafie 8-10, A-1040 Wien, Austria, 
e-mail: Alois . PanholzerOtuwien . ac . at 




Part V 

Probability 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Restrictions on the Position of the 
Maximum/Minimum in a Geometrically 
Distributed Sample 

Margaret Archibald 

ABSTRACT: We address the question: “What is the probability that the 
maximum in a geometrically distributed sample occurs in the first d positions of a 
word?^^ This can be considered in the strict sense and also when repeats are allowed, 
both in the first d positions, and in the rest of the word. The same situations 
are then considered for the minimum. Another generalisation is to consider the 
smaller probability that the minimum of the first d positions is the maximum of 
the remaining letters. This can also be interpreted both in the strict and weak 
sense, as can the last parameter to be discussed, namely: when both the minimum 
value and the maximum value of all letters in the word are situated in the first d 
places. 



1. Introduction 



Given a word whose letters are natural numbers, we consider these letters to occur 
independently and with geometric probability. So for p + ^ = 1, each letter i 
appears in the word with probability pq^~^. In this way larger letters occur less 
frequently than smaller letters. For all of the problems discussed below, we split 
the word into two parts: the first d letters of the word and the rest. All of these 
ideas can be interpreted in the strict and weak sense. For the special case d = 1, 
a similar concept has been looked at for compositions, see [5]. We make use of 
generating functions and Rice’s method [3, 8, 6, 7, 9, 10, 4, 1]. 

We first consider the scenario where the largest value in a sequence is situated 
in the first d letters. (We take words of length n and thus require d <n.) Note that 
we write i = to distinguish it from the i we use as an index. We also write 
Q = q~^ . By ‘strict’ we mean that a letter only occurs once and ‘weak’ means 
that the letters can occur any number of times (but at least once). We use {s,s) 
to denote the (strict, strict) cases, (w,s) to denote the (weak, strict) cases etc.. 



Theorem 1.1. The probability that the maximum value in a word of length n is in 
the first d positions is 

Max(^s,s){n)^Max(^yj^s){n) ~ ^(l+V'(logQ n)) ( = |^(l+V’(logQ n))) , 

where L = logQ and 'ip{x) = ^ F(1 — allow this maximum value 

to repeat in the rest of the word, then the probability is 



Max(^s,w){n) ~ Maa:(u,,^)(n) 



^(1 + V’(logQ n)). 



r\j 




284 



Margaret Archibald 



We then do the same for minima, only this time there are four different 
solutions and no fluctuations. 



Theorem 1.2. The probability that the minimum value in a word of length n is in 
the first d positions is 

{Q-l)d 

if we allow this minimum to occur only once (strict, strict). If we allow the mini- 
mum to recur in the first d positions but not in the rest of the word (weak, strict) 
we have 

^ Qn_l Qn 

For the last two we let the min occur any number of times in the rest of the word, 
but for the first case we allow only one min and for the second, more than one. 
We get (strict, weak) 



Minims, w){p) 

or (weak, weak) 



qn-d^Q _ (g _ 1 )^ 



Q^-l 






1 _ Q-d 

Min^w,w) (^) “ ^ _ Q~^ ~ 1 “ Q 



as n oo 



as n ^ oo. 



Another generalisation is that the minimum value of the first d letters is 
greater than (and possibly equal to) all other values in the word. 



Theorem 1.3. If we define 'ipdix) •= X] then the probability that 

the minimum value of the first d is greater than all the remaining letters in the 
word is 

... (Q-l)d! . (Q-l)d , ,, 



or 



Minmax(^yj^s^{n) 



Ln^Q^ Ln^Q^ 

T^d + T^d M^OgQTl) 



Ln^ Ln^ 

depending on whether the minimum is strict or weak (i.e., it occurs once or more 
than once). The probabilities for the same two cases when we require that the 
minimum is greater than or equal to all letters in the rest of the word are 



Minmax(^s,w) (^) 



{Q-l)d\ , {Q-l)d 



Ln^ 



+ 



Ln^ 



-M^ogQ n) 



and 



Minmax(^yj^yj^{n) 






Ln^ 



+ 



Ln^ 



-M^ogQ n), 



respectively. 



Finally, the probability that a word of length n has both its maximum and 
minimum in the first d letters is considered. In this case we only list two results, 
but there are many more cases (see section 5). 




Restrictions on the position of the maximum/minimum 



285 



Theorem 1.4. The maximum and minimum values will both occur in the first d 
letters of a word of length n with probability 

Both(,){n) ~ + V’(logQ n)), 

(with ip{x) as in Theorem 1) if neither is allowed to appear again anywhere in the 
word. If both are allowed to appear more than once in the word, this probability 
becomes 

— (1 + '0(logQ nj) as n oo. 



2. Maximum in the first d positions 

This idea can be viewed in four different ways, all of which end up with similar 
results asymptotically. These asymptotic results are proportional to c! as there are 
d chances that the maximum k will be where we want it to be. We can have (strict, 
strict) where k can only appear once in the whole word. Next, we can allow k to 
appear elsewhere in the first d positions, but not in the rest of the word. We will 
call this the (weak, strict) case. Alternatively, we can allow the letter k to appear 
any number of times in the rest of the word, but not more than once in the first d 
places - i.e., the (strict, weak) case. Lastly we can let k be anywhere in the word as 
long as it occurs at least once in the first d places. We call this case (weak, weak). 

For the purposes of keeping the paper a manageable size, we do not include 
all calculations. Instead only one example of the complete method is given. For all 
parameters the other cases are similar. For the maximum in the first d positions, 
the (strict, strict) case is given in detail, and the others are described symbolically. 



2.1. Maximum in the first d positions - (strict, strict) 

In this case we have strictly only one k in the word, which has to appear somewhere 
in the first d places. All other letters can only be 1, 2, . . . up to fc - 1. Symbolically 
we have 

d-i 

2=0 k>l 



and so our generating function becomes 

d-l k-1 , k-1 



- EE (E^«"") W-‘(E 

k>l 2=0 j = l j = l ^ l^j = l 



d-l 



k>l 2=0 j>0 



Thus 



d-l 



]^M (^) = EE(1 - 



k>l 2=0 

n—1 



h—n V / 



k>l h=0 




286 



Margaret Archibald 







Q/v+i(l_g-i) 

- 1 



We can now use complex analysis to evaluate the alternating sum asymptotically. 
The method is called ‘Rice’s method’ and the lemma we use (which allows us to 
express a sum such as this as an integral, see [2, 8, 11]) is 



Lemma 2.1. Let G be a curve surrounding the points 1,2, ...,n in the complex 
plane, and let f{z) be analytic inside C. Then 

where 

(-!)-■"! r(.. + i)r(-») 

z{z - 1) • • • {z - n) r(n+l — 

By extending the contour of integration, it turns out that under suitable 
growth conditions (see [2]) the asymptotic expansion of our alternating sum is 
given by 

Res([n; z]f{z)) + smaller order terms, 

where the sum is taken over all poles different from 1, . . . ,n. Poles that lie more 
to the left lead to smaller terms in the asymptotic expansion. 

In our case, the sum begins at zero and ends at n — 1, so we start with a 
contour surrounding points 0, 1, ... ,n — 1. We use Rice on f{z) = ^ 
and consider poles at 2 ; + l = 0 and z + 1 = Xki where Xk •= for all k G Z\{0}, 
(jL = logQ). To expand f{z) about 2 ; = — 1, we let e = z + 1. Then 

= Q--1 = 7i— 

and the residue is ^ This is combined with the kernel contribution 

(-ir^-Hn-l)! (-ir^(n-l)! 1 

^ ^ (-l)(-l-l)---(-l-(n-l)) (-l)"n! n’ 

so that the first pole gives us \ The remainder of the poles use e = z+1— Xfc, 

so (note that = 1) 

Q^+Xk-I ge_l ~ sL ’ 

and the residue is again The kernel is 

u_i.v 11 r(n-i + i)r(-Xfc + i) r(i-xfc)r(n) 

^ r(n- 1 + 1 - Xfc + 1) r(n-Xfc + l) 

~ r(l - = lr(l - Xfc)e'^''‘°«” = lr(l - 

n n 




Restrictions on the position of the maximum/minimum 



287 



which means that the remaining poles give ^ r(l — In total, 

k^O 

we have the expected value asymptotic to 

^-^(1 + ^p{logQ n)) where 4:{x) := ^ r(l - 

k^O 



2.2. Maximum in the jBbrst d positions - other cases 

For the (weak, strict) scenario we relax the requirements slightly, and allow more 
than one k in the first d places. This changes the symbolic expression Ak to 

d-i 

i=0 



Since we must be sure that k appears at least once, we secure the position of the 
first k. Again the union is taken over all k - i.e., over all words where k is the 
maximum, and we include the rest of the word ({1, . . . , fc - 1}*) as before. Thus 
the generating function is 



Tp{w,s) 



d—1 k~l 



w = EE(E 

A:>1 z=0 j=l 



zpq^ 






zpq 



k—l 



k 



(s 



' 1 - Ej=i ZPQ^~ 



If we swap these requirements round to look at the (strict, weak) case, we assume 
that k only appears once in the first d positions, but can occur any number of 
times in the rest of the word. We keep the same Ak as in the (strict, strict) case, 
but extend the rest of the symbolic equation to include k. The generating function 
is thus 



s 



M 



(^) 



d-1 

EE 

k>l i=0 



k — l 



j=l 



j=i 



, d—l—i 



1 - ELi zpq^~^ 



( 1 ) 

For the (weak, weak) case there is at least one k in any position between 1 and 
d. All other letters in the word can be anything in the alphabet 1, 2, . . . , A;. We 
express the first d letters symbolically as in the (weak, strict) case, and the rest as 
in the (strict, weak) case. In terms of generating functions, we thus have the same 
as in (1) except that the second sum on j ends at k and not A; — 1. 



3. Minimum in the first d positions 

We have four different scenarios here, as we can apply our weak/strict classifica- 
tions to the first d letters as well as to the rest of the word. We look at the same 
four cases as highlighted above. 

The calculations here are shorter, as the result can be evaluated exactly rather 
than asymptotically. As a result of this we do not need to use Rice’s method and 
also the solution has no fluctuations. This time only the (weak, strict) case is given 
in detail, and the others are mentioned more briefly. 




288 



Margaret Archibald 



3.1. Minimum in the first d positions - (weak, strict) 

Suppose we allow the minimum value j of a word of length n to recur (it must 
occur once) within the first d but not thereafter. Symbolically we express the first 
d letters of any such word as 

d-l 

i=0 

by fixing the first j and allowing further j’s to its right. This can be substituted 
into 

\Jj>^Aj{j + 1, . . . which symbolises the rest of the word - where no j’s are 
present. We then have 



d—l oo 



u, — J- 1 1 • ^ 



j>l 2=0 k=j-\-l k=j 

d-l 

= EEE 

j>l 2=0 fc>0 



and thus 






j>l 2=0 

^^Eo- = 



2=0 









2=0 



Q '^-1 

- r 



3.2. Minimum in the first d positions - other cases 

Suppose we consider the probability that the minimum value j appears in the 
first d positions of the word but never again (i.e., (strict, strict)). Symbolically we 
express the first d letters of any such word as 

d-l 

Aj~ + 

2=0 

which differs from the previous Ak only because the last bracket starts from j + 1. 
All possible words of this type would be + 1, . . . The (strict, weak) 

case uses the same but all possible words would be Uj>i + 1? • • • }*? so 

that we have the generating function as 



d-l oo . oo 

EE( E E 

j>l 2=0 k=j-\-l k=j-{-l 



zpq 



-) 



d—l—i 



1 






k-l • 



where J is j + 1 or j respectively. Similar small changes are needed for the final 
option - (weak, weak). We let j occur anywhere, as long as it appears at least once 

in the first d places. Again we have Aj — + 1, . . . }V{j, j + 1, . . . 

so the first j’s position is fixed, but others may occur to the right. All such words 
are represented by Uj>i + 1, . . . }*. 

Again, for a fixed d and large n, we can see that it is the classification of the 
‘rest’ of the word that takes precedence, i.e., the (strict, strict) and (weak, strict) 




Restrictions on the position of the maximum/minimum 



289 



cases are in a smaller order of magnitude to the (strict, weak) and (weak, weak) 
cases. 



4. The minimum of the first d is the maximum of the rest 

We require the minimum value (j) of the first d letters to be either strictly greater 
than or greater than or equal to the maximum of the rest. Again there are four 
cases, which are all combinations of the pair ‘weak’ and ‘strict’. The complete 
working is shown for the (strict, weak) case. 

4.1. Min of first d is greater than or equal to the rest - (strict, weak) 

If we consider a word in which j is the strict minimum of the first d letters, so 
that j occurs only once in the first d letters, but that we allow any of 1, 2, . . . , j in 
the rest of the word then we can represent the first d letters symbolically as 

d-l 

i=0 

which is part of the overall symbolic equation 

j>i 

This translates into the generating function 

d—l oo . oo 

j>l i=0 h=j-\-l h=j-\-l 

j>l i=0 h>0 
whose coefficients are 

[zn]pM(z) - 9 ^')"-" 

j>l i=0 



d—l—i 



1 - El=i 



-1 



d—l n—d 



n — d 
h 



d — 1 n — d / 

^ i—a h—n ^ / 



(- 1 ) 



h Q-i 



Qd+h _ I ' 



2=0 /i=0 ^ j>l i=0 h=0 

We consider the poles oi f{z) = ai z-\-d — 0 and at ^ + d = Xfc* Expanding 

around € := z-\- d gives 

Q-1 Q-1 Q-1 



m = 



< 5^-1 - 1 



with residue and 
[n — d; — d] 



(-l)”-<^-i(ra-d)! 



eL 

(n-d)!(d-l)! 



(— d)(— d - 1) • • • (-d — (n - d)) 

For e = z + d - Xk, 

Q-1 Q-1 Q-1 



n\ 



O' 



f{z) = 



Q^+Xk -1 Q^ -1 



eL 




290 



Margaret Archibald 



so the residue is also , and 

[n -d-,Xk-d]^ T{d - Xk)n^^-^ = ^T{d - "• 

Altogether as n — > oo, the expected value tends to 






L ^d(”) Ln^ ^ 

z=0 \d/ ,=0 



k^O 



(Q-l)d\ , {Q-l)d 



Ln'^ 



+ 



Ln'^ 



-V’d(logQn) 



where ipdix) := Y, ^{d ~ 
k^O 



4.2, Min of first d is greater than the rest - other cases 

For the (strict, strict) case j occurs only once in the first d letters, and is strictly the 
minimum there. It does not occur in the rest of the word either, and nor does any 
letter larger than j. We have the same Aj as above, and our total symbolic equation 
is Aj{l, . . . , j - 1}*. We can translate this into the generating function 



d-i 






d-l-i 






j>l i— 0 h=j-\-\ 



h=j + l 



For the (weak, strict) and (weak, weak) cases we allow j any number of times (but 
at least once) in the first d letters. We have 



d-l 



^0 = U ^ ' in all possible words |J A^{1, . . . , J}*, 

i>i 



2=0 



where J is j — 1 or j depending on whether the second restriction is strict or weak. 
Using the same notation, we translate this into the generating function(s) 



d—l oo 






d—l—i 



1 



j>l 2=0 



h=j 



1 - ELi ^P<i 



h-1 



5. Min and Max both in first d positions 

As in the previous cases, we start by looking at a symbolic equation which in this 
case represents all words with both the maximum and minimum in the first d 
places. This is then translated into a generating function whose coefficient of z'^ 
will be the probability that a word of length n has this requirement. 

There are sixteen different cases to consider, since we can have strict or weak 
for the maximum or minimum in the first d places or the rest of the word. The 
diagram 





First d letters 


Remaining letters 


Maximum 


strict/weak 


strict /weak 


Minimum 


strict/ weak 


strict/weak 




Restrictions on the position of the maximum/minimum 



291 



shows why this is so. In this paper we include some details for only the weakest 
case (see section 5.2 below), where the max and min can appear any number of 
times, as long as they first occur at least once each in the first d positions. Two 
other cases are mentioned. 



5.1. Max and min occur only once each 

We require that 2 < d < n, and that d is fixed. We can express the first d letters 
of such a word as 

d— 2 d—2—i 
i=0 h=0 

if the minimum occurs first (i.e., to the left of the maximum) and 

d-2d-2-i 

^kj — U U 0 • • • ? ^ ~ + 1, . . • , fc — + 1, . . . , A: — 1}^ ^ ^ ^ 

i=0 h=0 



if the maximum occurs first. Since we want to include all words (for all values of 
j < fc), we must consider 

U U + 1, • • • , A: — 1}*, 

j>l k>j 

where B could be jk or kj depending on whether the minimum or maximum 
occurs first. Note that in terms of generating functions, both give the same results 
due to multiplicative commutativity. Thus translation into generating functions 
for either case is the same, and so to involve all cases we must include a factor of 
2 : 



d-2d-2-i k-1 

F^^\z) :=2 EEEE (E zpq^ ^ 

j>l k>j i=0 h=0 



i=j-yi 



k—l 



( 0 



d—2—i~h 



l=j+l 



1 - Ett 



j+i 



zpq 






Note that the order of the leading term in the first case of Theorem 4 is - the 
product of the results in the previous cases, where we had the strict maximum of 
order ^ and the strict minimum of order Q^. 



5.2. Max and min can occm* any number of times 

We now look at the other end of the scale in more detail. As long as the max 
and min occur at least once each in the first d places, they are allowed to appear 
again any number of times anywhere in the word. Again, we split this into two 
cases symbolically (according to whether j or k appears first), but one generating 
function will do for both, if it includes the factor of 2: 



d—2d—2~i k 
j>l k>j i=0 ^=0 l=j 



k 

l=j 



(Ew ') 



d—2—i — h 



1=3 



i-Er=,^w 



i-i 




292 



Margaret Archibald 



d—2 d—2—i 



= 2EEE E 

j>lk>ji=0 h=0 l>0 

= d(d - 1) ^ ^ ^ 

j>l k>j l>0 

For the expected value, we are interested in the coefficients: 
[2 ”]F(“')(z) = d{d - 1) 

j>l k>j 



= d{d-l)Y, 
1=0 
n-2 

=d{d-i)Y, 



^ ^ 3>l k>o 

n - 2\ ; _y_ _j^y_ 

I P ’ q^-'' 1 - g'+i 1 - 



= d(d - 

gn_i i / ^^g/(gm_i)- 

Again, we use Rice’s method, with the function f{z) := -Qr^^hfizipj ’ We consider 
the simple poles at 2 : + 1 = 0 and z -\-l = Xk, k ^ 0. For the former, let e = z + 1, 
then 

fM = ^ = 1 1 

- 1) - 1) ^ Q-^eL) ’ 

so the residue is ^ . The kernel is 



(_l)("-")-i(n-2)! ^ (-ir-3(n-2)! ^ _J_ 

^ ^ (-l)(-l-l)---(-l-(n-2)) (-l)”-i(n-l)! n-l’ 

which means that the whole contribution from the first pole is For the 

other poles, let e = z 1 — \k^ 



Qe-i+xk(^QE+x>. - 1) - 1) ~ Q-^eL' 

thus the residue is again ^ , and the kernel {n oo) is 

r„_ 9 .. ,, r(n - 2 + l)F(-x;c + 1) ... F(1 - Xfc)r(n - 1) 

^ ^ r(n-2 + l-Xfc + l) r(n-Xfe) 

~ r(l - Xk)n-^~^-^”^ = F (1 - Xk)n^^-^ 

= -r(l = ir(l-Xfc)e 2 '=->° 8 Qn^ 

n n 



so altogether, we have 



^Er(i-xo« 



,2/cTri logQ n 



So the expected value in the ‘all weak’ case is 

[z^]F(^)(z ) — ^(l + V’(logQu)) asn^oo 




Restrictions on the position of the maximnm/minimum 



293 



(where '0(x) = ^ r(l — By comparing the ‘all strict’ case and the ‘all 

k^O 

weak’ case we can see what a difference it makes to have repeats. The larger case 
(all weak) is of order ^ whereas the smaller case (all strict) is of order Since 
the minimum is likely to occur very often, and the maximum very seldom (due 
to the geometric probabilities attached to the letters), we would expect the cases 
with a strict or weak minimum to dominate the various maximums. For interest 
we therefore look at one more case - where all the maximums are still strong and 
all the minimums are all weak. 



5.3. Max occurs only once and min can recur 

The single generating function which expresses this situation is 

d—2 d—2—i k—1 

j>l k>j h=0 l=j 



k-1 



k~l 



zPq'" 0 

l=i l=j 



d—2—i—h 



1 - Ef=7 w-i ' 



The expected value for the ‘min weak’ and ‘max strict’ is therefore 
which is also of order - as in the weak case. 



Acknowledgements. I would like to acknowledge the continued help, availability 
and support of my supervisors Prof. A. Knopfmacher and Prof. H. Prodinger. 



References 

[1] M. Archibald, A. Knopfmacher and H. Prodinger The number of distinct values in 
a geometrically distributed sample^ Submitted. 

[2] P. Flajolet and R. Sedgewick Mellin transforms and asymptotics: Finite differences 
and Rice’s integrals, Theoretical Computer Science, 1995, 144:101-124, Special vol- 
ume on mathematical analysis of algorithm. 

[3] P. Kirschenhofer and H. Prodinger On the Analysis of Probabilistic Counting, Lecture 
Notes in Mathematics, 1990, 1452:117-120. 

[4] A. Knopfmacher and H. Prodinger Combinatorics of geometrically distributed ran- 
dom variables: Value and position of the rth left-to-right maximum. Discrete Math- 
ematics, 2001, 226:255-267. 

[5] A. Knopfmacher and N. Robbins Compositions with parts constrained by the leading 
summand. (To appear in Ars Combinatoria) . 

[6] G. Louchard and H. Prodinger Ascending runs of sequences of geometrically dis- 
tributed random variables: A probabilistic analysis. Theoretical Computer Science, 
2003, 304:59-86. 

[7] H. Prodinger Combinatorial problems related to geometrically distributed random 
variables, in Seminaire Lotharingien de Combinatoire (Gerolfingen, 1993), Prepubl. 
Inst. Rech. Math. Av., 1993/34:87-95. 

[8] H. Prodinger Combinatorics of geometrically distributed random variables: Left-to- 
right maxima. Discrete Mathematics, 1996, 153:253-270. 




294 



Margaret Archibald 



[9] H. Prodinger Combinatorics of geometrically distributed random variables: New q- 
tangent and q- secant numbers, International Journal of Mathematics and Mathemat- 
ical Sciences, 2000, 24:825-838. 

[10] H. Prodinger Combinatorics of geometrically distributed random variables: Inver- 
sions and a parameter of Knuth, Annals of Combinatorics, 2001, 5:241-250. 

[11] W. Szpankowski Average Case Analysis of Algorithms on Sequences, John Wiley and 
Sons, New York, 2001. 

Margaret Archibald 

The John Knopfmacher Centre for Applicable Analysis and Number Theory, School 

of Mathematics, University of the Witwatersrand, P. 0. Wits, 2050 Johannesburg, 

South Africa. 

Email: marchibald@maths.wits.ac.za 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Dual Random Fragmentation and Coagulation 
and an Application to the Genealogy of Yule 
Processes 

Jean Bertoin and Christina Goldschmidt 



ABSTRACT: The purpose of this work is to describe a duality between a 
fragmentation associated to certain Dirichlet distributions and a natural random 
coagulation. The dual fragmentation and coalescent chains arising in this setting 
appear in the description of the genealogy of Yule processes. 



1. Introduction 

At a naive level, fragmentation and coagulation are inverse phenomena, in that 
a simple time-reversal changes one into the other. However, stochastic models for 
fragmentation and coalescence usually impose strong hypotheses on the dynam- 
ics of the processes, such as the branching property for fragmentation (distinct 
fragments evolve independently as time passes), and these requirements do not 
tend to be compatible with time-reversal. Thus, in general, the time-reversal of a 
coalescent process is not a fragmentation process. 

Nonetheless, there are a few special cases in which time-reversal does trans- 
form a coalescent process into a fragmentation process. Probably the most impor- 
tant example was discovered by Pitman [17]; it is related to the so-called cascades 
of Ruelle and the Bolthausen-Sznitman coalescent [7] , and also has a natural inter- 
pretation in terms of the genealogy of a remarkable branching process considered 
by Neveu, see [4] and [6]. 

The first purpose of this note is to point out other simple instances of such 
duality, which rely on certain Dirichlet and Poisson-Dirichlet distributions. Then, 
in the second part, we shall show that these examples are related to the genealogy 
of Yule processes. 



2. Dual fragmentation and coagulation 

2.1. Some notation 

For every integer n > 1, we consider the simplex 

{ n+l 

X = (xi, . . . , Xn+i) - Xi>0 for every i = 1, . . . , n -|- 1 and Xi 

i=\ 

It will also be convenient to agree that Aq := {1}. We shall often refer to the 
coordinates xi, . . . of points x in as masses. 





296 



Jean Bertoin and Christina Goldschmidt 



We recall that the n-dimensional Dirichlet distribution with parameter (ai, 

. . . , Qfn+i) is the probability measure on the simplex with density 

r(ai H h Q^n+i) Qi-i ocn+i-i 

r(ai)---r(a„+i) 1 ■■■"+! • 

The special case when ai = ... = := a G ] 0 , oo[ will have an important role 

in this work; it will be convenient to write Dir^(a) for this distribution. We recall 
the following well-known construction: let 71, . • . ,7n+i be i.i.d. gamma variables 

with parameters (a, c). Set 7 = 71 -| h 7n, so that 7 has a gamma distribution 

with parameters {a{n -h l),c). Then the (n + l)-tuple 

( 7 i/ 7 ,---, 7 n+i/ 7 ) 

has the distribution Dirn(a) and is independent of 7. 

We also define the (ranked) infinite simplex 

f CX) ^ 

Aoo •= < X = {xi, . . .) : xi > X2 > • • • > 0 and = 1 / 

and recall that the Poisson-Dirichlet distribution with parameter 0 > 0 , which will 
be denoted by PD( 0 ) in the sequel, is the law of the random sequence 

^ a2 \ 

S • I V^OO 7 V^OO 5 • • • I ) 

where ai > a2 > . . . > 0 are the atoms of a Poisson random measure on ] 0 , 00 [ 
with intensity 6y~^e~ydy. We also recall that ^ is independent of 
the latter has the gamma distribution with parameters (^, 1 ). By the celebrated 
Levy-Ito decomposition of subordinators, we may also rephrase this construction 
as follows: if 7 — (7(^)5 1 > 0) is a standard gamma process and, for each fixed 
0 > 0, > J2 ^ • denotes the sequence of sizes of the jumps of 7 on the time 

interval [0,0], then 

( S, 62 \ 

has the PD( 0 ) distribution and is independent of 7(0). 

2.2. Two dual random transformations 

We now define two random transformations: 

Frag;, : -> A^+/, and Coag;, : A n+k ^ 5 

where fc, n are integers. 

First, we fix X = (xi, . . . , Xn-yi) G A^^ and pick an index I G { 1 , . . . , n -h 1 } 
at random according to the distribution 

P(/ = i) = Xi, i = 1 , . . . , n + 1 , 

so that X/ is a size-biased pick from the sequence x. Let y = {rji,. . . ,rjk-yi) be a 
random variable with values in Ak which is distributed according to Dir/c(l/A:) and 
independent of I. Then we split the Ith mass of x according to y and we obtain a 
random variable in A^+fc- 

Fragjt(x) := (xi, . . . ,X/_i,x/T7i, . . . ,x/?7/c+i,x/+i, . . . ,Xn+i) . 

Second, we fix x = (xi, . . . , x^+fc+i) ^ A^+fc and pick an index J G {!,..., n+ 
1 } uniformly at random. We merge the A; + 1 masses x j, xj+i . . . , xj+fc to form a 




Dual Random Fragmentation and Coagulation 



297 



single mass Y2i=j leave the other masses unchanged. We obtain a random 

variable in A^: 

/ J+fc 

Coag;.(a:) = I xi, . . . E ^2 7 ^ J+fe+l 7 • • • 7 ^n+fc+1 

V i=J 

Remark. Consider the following alternative random coagulation of x == (xi, . . . , 
Xn+fc+i) G Ayj+fc. Pick fc + 1 indices ii, . . . , ik+i from {1, . . . , n + A: + 1} uniformly 
at random without replacement, merge the masses x^^ , . . . , x^^^^ , leave the other 

masses unchanged and let Coag;.(x) be the sequence obtained by ranking the re- 
sulting masses in decreasing order. Write also Coagj^(x) for the sequence Coag^(x) 
re-arranged in decreasing order. Then if ^ is exchangeable the pairs (^,Coag^(^)) 

and (^, Coagj^(^)) have the same distribution. This remark applies in particular 
to the case when ^ has the law Dirn^kO-fk), and can thus be combined with 
forthcoming Proposition 2.1. 

The starting point of this work lies in the observation of a simple relation of 
duality which links these two random transformations via Dirichlet laws. 

Proposition 2.1. Let fc, n > 1 be two integers, and two random variables with 
values in An and An-^k, respectively. The following assertions are then equivalent: 
(i) i has the law Dirn(l/fc) and, conditionally on is distributed as 

Pragfc(0. 

(a) has the law Dir^+fc(l/fc) and, conditionally on ff , ^ is distributed as 

CoagkiO- 

It has been observed by Kingman [13] that for fc = 1, if is uniformly 
distributed on the simplex A^+i (i.e. has the law Dirn+i(l)), then Coagi(^') is 
uniformly distributed on A^. Clearly, this agrees with our statement. 




Proof: Let 7 i , 727 • • • 7 7 n+i be independent Gamma(l/fc, 1 ) random variables and 

set 



n+l 

i=l 



and ^ = 



7 i 
7 ’ 



7 n+l \ 

7 J 



so that ^ has law Dir^(l/fc) and is independent of 7 . Suppose that r/ is a Dirfc(l/fc) 
random variable which is independent of the 7 i’s, and let $ : ^ R 

be a bounded measurable function. Let I be an index picked at random from 
{ 1 , . . . , n + 1 } according to the conditional distribution 



P(/ = z| 7 i,..., 7 „+i) = 7 i/ 7 , i = l,...,n + l, 

and denote by Pragj.(^) the random sequence obtained from ^ after the fragmen- 
tation of its Ith mass according to r/. We have 



E($(FYag,(0),/ = f) = E 



7 i 



Now, using the independence of 7 and ^ and the fact that 7 has the law Gamma((n-|- 
l)/fc, 1 ), we see that the last expression is equal to 



fc 



n-f 

fc 



n+l 



-E [ 7 i$ {{'yi/'r)i<i,7iT]/'y, i7ih)i>i)] 






X + 7j ’ a: + 7 j’ x + 7j I r(l/fc) ‘ 



_l/fe-l -X 



dx 




298 



Jean Bertoin and Christina Goldschmidt 



( iv {'yi)i>i \ 

VY + Ejy* y ’ Y + 7j ’ Y + Ej#i ij ) 

where 7 ' ~ Gamma((fc+ l)/fc, 1 ), independently of r] and But then 7 'r/ is a 

collection of fc -f- 1 independent Gamma(l/fc, 1 ) random variables, so Prag^(^) has 
the law Dir^_^fe(l/fc) and is independent of the random index I which is uniformly 
distributed on {l,...,n + 1}. Since we can recover ^ from Frag^j.(^) and I by an 
obvious coagulation, this completes the proof. □ 

Next we turn our attention to the infinite ranked simplex and define two 
random transformations, Prag^ : Aqo Aoo and Coag^ : Aoo Aoo, where 
a G [0, 1] is some parameter. The fragmentation transformation on the infinite 
simplex simply mimics that on the finite simplex; in this direction, recall that the 
Poisson-Dirichlet PD(1) arises as the weak limit as A; 00 of sequence of Dirfe(l/fc) 

variables after obvious re-ordering. More precisely, given x = (o^i, . . .) G Aoo, we 
pick a mass xj at random by size-biased sampling and split xj using an independent 
variable rj = ( 7 / 1 ,...) with law PD(1). In other words, Prag^(o:) is the ranked 
sequence with unordered terms rri, . . . , x/_i, X/r/i, a:/? 72 , . • • , 3 : 7 + 1 , — 

Next, consider a sequence C/i, ( 72 , . • . of i.i.d. uniform random variables and 
a G [0, 1]. Starting again from some fixed x G Aoo, we merge the masses X{ for 
which Ui < a into a single mass and leave the others unchanged. We denote 
by Coag^(x) the random sequence obtained by putting the resulting masses in 
decreasing order. We then have the following analogue of Proposition 2.1, which 
is reminiscent of Corollary 13 of Pitman [17]. 

Proposition 2.2. Let be two random variables with values in Aoo- For every 
0 > 0, the following assertions are equivalent: 

(i) ^ has the law PD(0) and, conditionally on is distributed as Pragoo(0- 
(a) has the law PD(0 + 1) and, conditionally on ^ is distributed as 
Coagi/( 0 +i)(^')- 

Proof: Let 7 = {j{t),t > 0 ) be a standard gamma process and set 

Dt = ji{9 + m/'r{e + i), 




for 0 < t < 1, so that {Df,0 < t < 1) is a Dirichlet process of parameter 1. (The 
vector of ordered jumps of this Dirichlet process has the PD(^ -h 1) distribution.) 
Consider the following alternative way of thinking of the random coagulation op- 
erator Coagi/( 5 i+i): pick a point V uniformly in [0,1] and define a new process 
(D',0 < t < 1) by 



D' = / ^^7(^+1) if ^ < V" 

^ \7)(i+6/t)/(6l+l) if t > 17. 



As the times of the jumps of D are uniformly distributed on [0,1], this picks a 
proportion l /{0 -h 1) of them and coalesces them into a single jump (say (3* = 
D{i+ 0 v)/( 0 ^i) - Dqy/(^ 0 ^i)) at V. Let > /?2 > - - - > 0 be the sequence of other 
jumps of D' and C/i, f/ 2 , • • • the corresponding jump times. Let ^ . . . > 0 

be the sequence of jumps of D in the interval [OV/{0 + 1), (1 + 0V)/{6 -f- 1)], so 
that /?* = We wish to show that D' is a Dirichlet process with parameter 

9, so that the vector (/3*,/3i, /? 2 , - - •) of its jumps (re-arranged in the decreasing 
order) has the PD(0) distribution. We will also show that the mass f3* resulting 
from the coalescence constitutes a size-biased pick from this vector. 




Dual Random Fragmentation and Coagulation 



299 



Let 

^ ^ ’ {'fit + 1 ) - {jive + 1) - 7(F(9)) ifve<t<9 
7^(t) = j{V0 -\-t) - ^{V9) for 0 < t < 1. 

Then 7^ and 7^ are independent processes with 7^ = (7(t),0 < t < 0) and 

7^ = (7(^)5 0 < ^ < 1), independently of V. Write > ^2 > . . . for the ordered 
sequence of jumps of 7^ and Ti, T2 , . . . for the corresponding times of these jumps. 
Write > (^2 > . . . for the ordered sequence of jumps of 7^. Then 

(i) Ui = Ti/6, U 2 = T2/0 , ... are i.i.d. U[0, 1], 

(ii) = 7^(1)/7(1 + 9) and so has a Beta(0, 1) distribution, 

(iii) /025 • • •) ~ 7%) (^i5 ^2? • • •) PD(1) distribution, 

(iv) = 71^ (Ji, (^2, • • •) and so has the PD(0) distribution. 

Furthermore, the random variables in (i) to (iv) above are independent. The fact 
that /3* is a size-biased pick from (^*,/3i,^2, • • •) and the FD{9) distribution of 
the latter follow from (i) and (iii) and the stick-breaking scheme (see, for instance. 
Definition 1 in Pitman and Yor [19]). That D' is a Dirichlet process of parameter 
9 then follows from (iv) and the independence. 

The coagulation operator used here can be re-phrased as follows: starting with 
X G Aoo, take a sequence Vi, V25 • • • of i.i.d. U[0, 1] random variables, merge the 
masses Xi for which Vi G [9V/{9 -h 1), (1 -h 9V)/{9 + 1)], leave the other masses 
unchanged and, finally, rank the resulting sequence in decreasing order. Call this 

operator Coagi/(^_^i). Then it is clear that whenever is a random exchange- 
able sequence in Aqo, (C^ Coagi/(^_^]^)(^')) and (C^ Coagi/(0_^i)(^')) have the same 
distribution. Our claim follows now readily from these results. □ 

Remark. It may be interesting to check Proposition 2.2 as follows. Consider Poisson 
random measure M on (0, 00) with intensity 9x~^e~^dx. Let ai,a2 ,... be the 
atoms of M in decreasing order, so that 





300 



Jean Bertoin and Christina Goldschmidt 



where a' ~ Exp(l), independently of M and rj. But then a'r] has the distribution 
of the atoms of a Poisson random measure with intensity x~^e~^dx arranged in 
decreasing order and so we see that taking these atoms together with those of 
M, we obtain a Poisson random measure of intensity {6 -h l)x~^e~^dx. Hence, 
Frag^(^) has the law PD(0 + 1). 

2.3. Dual firagmentation and coagulation chains 

The dual fragmentation and coagulation operators that were defined in the pre- 
ceding section incite us to introduce Markov fragmentation and coagulation chains 
in duality by time-reversal. Specifically, we consider for each integer fc > 1 a chain 

where X^^\n) is a random variable with values in (in particular X^^^ (0) = 1), 
and the conditional distribution of X^^\n -f- 1) given X^^\n) = x is the law of 
Prag;.(x). We deduce from Proposition 2.1 by induction that for each n, X^^\n) 
has the distribution Dirn/c(l/A:). The time-reversed coagulation chain 

. . . , X^^\n -f 1), . . . , 

is also Markov; more precisely, this means that the conditional distribution of 
X(^)(n) given X^^\n + 1) = a; is the law of Coagj^{x). Note that for k = this 
has the distribution of the jump chain of Kingman’s coalescent [13]. 

Analogously, for fc = oo, we can define a Markov fragmentation chain on A^o, 

such that the conditional distribution of X(°°)(n + 1) given X^°°)(n) = a; is the law 
of FragoQ(x). We deduce by induction from Proposition 2.2 that for every 0 > 0, 
if the distribution of the initial state is PD{6) then, for every integer n, 

has the distribution PD(0 + n). Moreover, in this case, the time-reversed 
coagulation chain 

. . -h l),X^^^n), . . . ,X^~^(1), 

is also Markov; more precisely, the conditional distribution of X^"^\n) given 
-f 1) = X is the law of Coagi/(,^_^i^_6i)(x). 

Remarks, (a) Recall that the parameter 6 can be recovered from a sample ^ of a 
FD{6) random variable as follows: 

6 = lim ; — 5-- maxjn : £n > • 

e-^o+logl/s 

This shows that the above description for the reversed coagulation chain is indeed 
Markovian. 

(b) There is simple representation for the k = oo fragmentation chain in 
terms of compound bridges with exchangeable increments which is inspired by [5]. 
Let Uo,Ui, . . . be a sequence of independent uniform variables on [0, 1]. For each 
n, we consider the elementary bridge bn ' [0, 1] — ^ [0, 1] defined by 

Tl 1 

^ S [0> 1] • 

Then is is easy to check that for every n G N, the sequence bn o 6n+i o • • • o 
bn+i converges pointwise almost surely as i — > oo to a bridge with exchangeable 
increments Bn which has no drift and infinitely many jumps a.s. If we write f3n G 




Dual Random Fragmentation and Coagulation 



301 



An for the sequence of the sizes of the jumps of Bn ranked the decreasing order, 
then the chain {l3n,n E N) has the same law as We refer to [5] for the 

necessary technical background. 



3. The genealogy of Yule processes 

We shall now show that the dual fragmentation and coagulation chains which we 
introduced in the preceding section are naturally connected to the genealogy of 
Yule processes. 



3.1. Discrete setting 

For every integer A: > 1, we write > 0^ for the Yule process started 

from = 1: gives the number of individuals alive a time t in a branching 

process in which each individual lives for an exponential time of parameter 1 and 
gives birth at its death to fc 4- 1 children, which then evolve independently of one 
another according to the same rules as their parent. We agree to label each child of 
an individual by an integer in{l,...,fc+l}, which allows us to order individuals at 
any generation in a consistent way: given two distinct individuals, we may consider 
their most recent common ancestor. Plainly, two different children of this ancestor 
are ancestors of exactly one of these two individuals, and the labelling of the 
children of the most recent common ancestor induces the order of the individuals. 

Lemma 3.1. The process ^exp{—kt)Y}^\t >0^ is a uniformly integrahle martin- 
gale and its limit has the Gamma(l/fc, 1/fc) distribution. 



Proof: A similar limit result is stated in Athreya & Ney [1] on page 130; however, 
the limiting distribution given there is incorrect and so we shall provide here a 
detailed proof. The martingale property is classical, so we focus on the distribution 
of the limit Define 

The backward equation implies that 

, $o(s) = s. 

This equation has solution 

Ms) = se-^ 

Hence, for 0 < 0, 

E (exp = exp (0e“''‘) e"* (l - (l - e"''*) exp 

= [e*^* exp - e^* + l] , 

and when ^ oo, this quantity converges to 



(1 - 



( ijk y/'' 
\ilk-e) 



which is the moment generating function of a gamma random variable with pa- 
rameters (1/fc, 1/fc). □ 




302 



Jean Bertoin and Christina Goldschmidt 



We think of as the size of the terminal population. For every t > 0, by 
application of the branching property at time t, we may decompose the terminal 
population into sub-populations having the same ancestor at time t. Specifically, 

y'(^) 

i=l 

where (t) is the size of the terminal sub-population descending from the zth 

(k) 

individual in the population at time t. Observe that conditionally on , the 
variables are independent and all have the same law as 

Finally, we define the genealogical process > O) associated 

to by 

The genealogical structure of the Yule process can be described in terms of 
the fragmentation chain of Section 2.3 as follows. 

Theorem 3.1. Let N = {Nt^t > 0) be a standard Poisson process which is inde- 
pendent of the chain Then for each w > 0, the compound chain 

has the same law as the time-changed process 

(^G('=)Qlog(l + fci)^,t>0 

conditioned on = w. 

Remark. Theorem I of Kendall [12] states that given ^ 

is a Poisson process with unit parameter. This is clearly an aspect of Theorem 3.1. 
Moreover, on page 130 of Athreya & Ney [1], it is suggested that no generalization 
of Kendall’s result to a more general continuous-time Markov branching process 
is known; Theorem 3.1 constitutes a small such generalization. 

Proof: Set r{t) := | log(l + kt) and let T be the time of the first birth in the 

Yule process, which is also the time of the first dislocation of The k + 

1 fragments of can be written as e~^^Zi, . . . ,e“^^Zfc+i where, by the 

branching property, Zi, . . . , are i.i.d. Gamma(l//c, 1/A:) random variables, 
independent of T which is Exp(l). Define a change of variables by 

S = r-\T) = {e^^ -l)/k 

Uk^e-'^^Zk, W = e-’^^{Z, + --- + Zk+i). 

It is straightforward to see that the joint density of (T, Zi, . . . , Zk+i) is 
f(t,Zi,...,Zk+l) 

= exp(-(zi + • • • + Zk+i)/k) 

and so the joint density of {S,U\, . . . ,Uk, W) is 

g{s,ui, ...,Uk,w) = we“"'® • {l/k)T{l/k)~^w~^^^{uiU2 ...Uk 




Dual Random Fragmentation and Coagulation 



303 



■{w-ui Uk)y/'^-^ ■ exp{-w/k). 

Hence, W ~ Gamma(l/fc, 1/k) (as we already knew) and, conditional onW = w, 
we have S ~ Exp{w) and (f/i, (72, • • • , (7^, W - Ui — - • — Uk) ~ wDiTk{l/k) 
independently of S. Thus, the first dislocation has the correct dynamics. But by 
the branching property, subsequent dislocations are independent for different sub- 
populations and the total rate of fragmentation is always w. Hence result. □ 

In the terminology of [2], Theorem 3.1 states that the time-changed genealog- 
ical process o r is a self-similar fragmentation with index 1, dislocation law 
Dir/c(l/fc) and erosion coefficient 0. It may be interesting to observe that in the 
special case k = this result can also be derived as follows. 

Consider a real Brownian motion B started from 1 and killed when it reaches 
0 (at time To = inf{t >0:Bt = 0}). For every u G [0, 1[, let denote the number 

of excursions of B away from 1 which go below level u. Then (^i^iog(i_^))o<u<i 
is a version of (l^)o<u<i- To see this, let us consider the evolution of Y. Firstly, 

Yq = corresponding to the single excursion below 1 which reaches 0. Let D = 
sup{^ <Tq : Bt = 1}, the starting time of the final excursion which hits 0, let U = 
info<t<D Bt be the level reached by the deepest excursion below 1 before D and 
let Tu be the time at which it is reached. Then, by Williams’ path decomposition 
theorem (Theorem VII.4.9 of Revuz and Yor [20]), U is distributed uniformly 
on [0, 1[ and, conditional on (7, {Bt)o<t<Tu is a Brownian motion started at 1 
and stopped when it first hits level U. By symmetry, {BD-t)o<t<D-Tu is another 
independent Brownian motion started at 1 and stopped when it first hits level 
(7. Thus, is equal to 1 on [0,/7[, Yu = ‘^ and {Yu-\-v)o<v<i-u evolves as the 

sum of two independent processes which are the same as Y except that the times 
until the first jumps are now uniform on [0, ?7[ rather than on [0, 1[. (This is 
Theorem 8 of Le Gall [16], repeated here for completeness.) Time-changing Y^) 
with u ^ — log{l — u) means that its exponential inter-jump times become uniform 

and so we do, indeed, have (yiVog(i-„))o<«<i = (?«)o<«<i- 

A more elegant way of expressing the preceding argument is to say that 
the Brownian path encodes a continuous-state branching process with quadratic 
branching mechanism. The local time at level 1, satisjSes 

LI = lim 2(1 - u)Yu. 

^ W — >1 — 

In this context, corresponds to the size of the population at time 1 in the 
continuous-state branching process generated by a single ancestor conditioned to 
have descendents up to time 1. The so-called reduced tree associated with the pop- 
ulation at time 1 is described up to the deterministic time-change u ^ — log(l — -n) 
by the Yule process Y^). See, for instance. Section 2.7 in Duquesne and Le Gall [8], 
and Fleischmann and Siegmund-Schultze [9]. Note that the well-known fact that 
has an exponential distribution with mean 1 (Proposition VI.4.6 of Revuz 
and Yor [20]) gives another derivation of the limiting distribution in Lemma 3.1, 
since 

= lim = lim (1 - n ^ = 7 t • 

t^oo ^ ^ -log(l-n) 2 ^0 

It is known from excursion theory that in the scale of the local time at level 
1, the rate of excursions of B away from 1 which reach level u G ]0, 1[ but do 




304 



Jean Bertoin and Christina Goldschmidt 



not exceed u — du is {1 - u)~‘^du. Note that the map s ^ 1 — from R 4 . to 
[0, 1[ has inverse u — 1 and, thus, transforms Lebesgue measure on R_^ into 

the measure (1 — u)~‘^du on [ 0 , 1 [. Suppose that we split the local time at level 1 
according to the occurrence of excursions exceeding level u, so that we obtain the 
sequence 

W{u) = (Wi{u),...,Wy^{u)y 

where W{u) is the sequence of the increments of the local time at level 1 on 
the maximal time intervals such that at the beginning and end of each interval 
B is at 1 and during the interval remains above level u. Then it follows easily 

that the time-changed process (w , s > 0 ^ is a fragmentation in which 

each mass, say x, splits at rate x into xU and x{l — U) where U is uniform. In 
other words, conditionally on = w, the process (w > 0 ^ is 

distributed as the compound fragmentation chain > O), where N 

is an independent standard Poisson process. 

Finally, the composition of the two time-changes which appear in this analysis 
yields 

s -*• - log 1^1 - ^1 - ^ = log(l + s) , s e R+ , 

and so we recover Theorem 3.1 in the special case k = 1. Unfortunately, it does 
not seem that there are similar interpretations for k >2. 

Corollary 3.2. We have that 

is a time-homogeneous Markov coalescent process which is independent of 
For any n > 1, given that it is in state x G Xnk, 'i't waits an exponential time of 
parameter n and then jumps to a variable distributed as Coa,gj^{x), independently 
of the exponential time. 

Note that the case A; = 1 of this result gives a variation of Kingman’s coales- 
cent. The jump-chains are identical, as we have already noted, but here the rate of 
coalescence of two blocks depends on the total number of blocks present, whereas 
in Kingman’s coalescent it does not. 

Proof: Firstly, we note that by Theorem 3.1, 
has the same law as 

(x('=)(iV,- 0 ,<GR) 

and so we will work with the latter process instead. The A: = 1 case is essentially 
treated in [3] and the proof proceeds in the same way here. The jump chain clearly 
behaves in the correct manner and so it remains to check that the inter-jump times 
are as claimed. Let 0 < Ti < T 2 < . . . be the jump times of {Nt)t>o- Then the 
first instant that X^^^N^-t) has exactly nk + 1 terms is 

inf {t eK: N^-t = n} = - logT^+i. 




Dual Random Fragmentation and Coagulation 



305 



The sequence of inter-jump times is 



...,logTn+i -logTn,logTn -logTn_i,...,logT2 -logTi 

and it is easily shown that this is a sequence of independent exponential random 
variables with parameters 

respectively. □ 

3.2. Continuous setting 

Continuous- state branching processes (or CSBP’s) were introduced by Lamperti 
[14, 15] as limits of rescaled branching processes. Typically, a CSBP is a time-ho- 
mogeneous Markov process with values in R4., 

Z = (Z(t,a), t > 0 and a > 0) , 

(where the parameter t refers to time and the parameter a to the starting point 
i.e. Z(0,a) = a a.s.) which fulfils the branching property: the path- valued process 
(Z(-,x),x > 0) has independent and stationary increments. In particular, if Z{-^y) 
is an independent copy of Z{'^y)^ then Z(*,x) + Z{'^y) has the law of Z(-,x + y). 
There is a simple relation connecting CSBP’s and Bochner’s subordination for 
subordinators which enables us to define their genealogy; we refer the interested 
reader to [4] for heuristics, detailed arguments etc. 

We call a continuous state Yule process a CSBP 

Y = (y(t, a), t > 0 and a > 0) , 

which evolves as follows: for each a > 0, the process T(-,a) waits an exponential 
time with parameter a and then jumps to a -fl. It then evolves independently as 
if it had been started in state a -f- 1. In terms of the genealogy, the sub-population 
of size 1 which is born at a jump time has a parent which is chosen uniformly at 
random from the population present before the jump. Note that this genealogy 
is easy to describe in a consistent manner for different values a of the starting 
population. 

It is immediate that for an integer starting point a E N, the process {Y (t, a), 
t > 0) is a Yule process Y^^^ with 2 offspring, as considered in the preceding 
section. However, we stress that its genealogy is not the same as that of Y^^\ 
as we are dealing with a continuous population in the first case and a discrete 
population in the second. 

We have the following analogue of Lemma 3.1: 

Lemma 3.3. For every a > 0, the process (e“*y(t,a),t >0) is a uniformly inte- 
grable martingale. Its limit, say 7(a), viewed as a process in the variable a, has 
the same finite dimensional laws as a standard gamma process. 

Proof: For a = 1, we see from Lemma 3.1 and the identity in distribution Y (•, 1) 

= ^hat (e”^y (t, 1), t > 0) is a uniformly integrable martingale and that its 

limit has the standard exponential distribution. The proof is easily completed by 
an appeal to the branching property. □ 

Remark. The limiting distribution in Lemma 3.3 is essentially a corollary of The- 
orem 3 of Grey [10]. 




306 



Jean Bertoin and Christina Goldschmidt 



Just as in the preceding section, we think of 7 (a) as the size of the terminal 
population when the initial population has size a. We can express 7 (a) as 

t(«) = X] 

b<a 

where S := 6 > 0 ) is the jump process of 7 , which corresponds to decomposing 

the terminal population into sub-populations having the same ancestor at the 
initial time. We write G(0, a) for the sequence of the jumps of 7 on [0, a], ranked 
in decreasing order, and we deduce from Lemma 3.3 that conditionally on 7 (a) = g, 
G{0^a)/g has distribution PD(a). 

More generally, by the branching property, we can decompose the terminal 
population into sub-populations having the same ancestor at any given time t. 
This gives 

7(«) = X ’ 

b<Y{t,a) 

where := > 0 ) is the jump process of a standard gamma process 7 ^^^ 

which is independent of the Yule process up to time t, {Y (s, c), s E [0, t] and c > 0 ). 
This enables us to define for eaeh a > 0 the genealogical process associated to a 
Yule process Y(-,a), 

G(-,a) = {G{t,a),t > 0) , 

where e^G{t, a) is the ranked sequence of the sizes of the jumps of the subordinator 
7 ^^) on the interval [ 0 , Y(t, a)]. 

An easy variation of the arguments for the proof of Theorem 3.1 shows that 
the genealogical structure of the Yule process can be described in terms of the 
fragmentation chain of Section 2.3 as follows. 

Theorem 3.2. Fix a^g > 0 and let the chain have initial distribution PD(a). 

Introduce a standard Poisson process, N = > 0), which is independent of 

the chain X^^\ Then the compound chain 

has the same law as the time- changed process 

{G (log{l + t),a) ,t > 0) 

conditioned on 7 (a) = g. 

Likewise, the analogue of Corollary 3.2 is as follows. 

Corollary 3.4. Fix a > 0. Then 

(log(l + e ^ /'y{a)),a) ,t 

is a time-homogeneous Markov coalescent process which is independent ofj{a). 
Suppose that it is in state x E Aq© OL^id recall Remark (a) of Section 2.3. Then if 

lim ; — maxji : Xi > e} = n -h a, 
e-^o+ log 1 /e 

the process waits an exponential time of parameter n and then jumps to a variable 
distributed as Coagi/(^_^^)(x), independently of the exponential time. 




Dual Random Fragmentation and Coagulation 



307 



References 

[1] Athreya, K.B. and Ney, RE. (1972). Branching processes. Springer- Verlag, Berlin- 
Heidelberg-New York 

[2] Bertoin, J. (2002). Self-similar fragmentations. Ann. Inst Henri Poincare 38 , 319- 
340 

[3] Bertoin, J. (2003). Random covering of an interval and a variation of Kingman’s 
coalescent. To appear in Random Structures Algorithms. Also available as Preprint 
PMA-794 at http://www.proba.jussieu.fr/mathdoc/preprints/index.html 

[4] Bertoin, J. and Le Gall, J.-F. (2000). The Bolthausen-Sznitman coalescent and the 
genealogy of continuous- state branching processes. Probab. Theory Relat. Fields 117 , 
249-266 

[5] Bertoin, J. and Le Gall, J.-F. (2003). Stochastic flows associated to coalescent pro- 
cesses. Probab. Theory Relat. Fields 126 , 261-288 

[6] Bertoin, J. and Pitman, J. (2000). Two coalescents derived from the ranges of stable 
subordinators. Elect. J. Probab. 5, 1-17. Available via 
http://www.math.u-psud.fr/~ejpecp/ejp5contents.html 

[7] Bolthausen, E. and Sznitman, A.S. (1998). On Ruelle’s probability cascades and an 
abstract cavity method. Comm. Math. Physics 197 , 247-276 

[8] Duquesne, T. and Le Gall, J.-F. (2002). Random trees, Levy processes and spatial 
branching processes. Asterisque 281 

[9] Fleischmann, K. and Siegmund-Schultze, R. (1977) The structure of reduced critical 
Galton- Watson processes. Math. Nachr. 79 , 233-241 

[10] Grey, D. R. (1974). Asymptotic behaviour of continuous time, continuous state-space 
branching processes. J. Appl. Probab. 11, 669-677 

[11] Kallenberg, O. (1973). Canonical representations and convergence criteria for pro- 
cesses with interchangeable increments. Z. Wahrsch. verw. Gebiete 27 , 23-36 

[12] Kendall, D. G. (1966). Branching processes since 1873. J. London Math. Soc. 41 , 
385-406 

[13] Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 , 235-248 

[14] Lamperti, J. (1967). The limit of a sequence of branching processes. Z. Wahrsch. 
verw. Gebiete 7 , 271-288 

[15] Lamperti, J. (1967). Continuous-state branching processes. Bull. Amer. Math. Soc. 
73 , 382-386 

[16] Le Gall, J.-F. (1989). Marches aleatoires, mouvement brownien et processus de 
branchement. Seminaire de Probabilites XXIII, Lecture Notes in Math., 1372, 
Springer, Berlin, 258-274 

[17] Pitman, J. (1999). Coalescents with multiple collisions. Ann. Probab. 27 , 1870-1902 

[18] Pitman, J. (2002). Combinatorial Stochastic Processes. Lecture notes for the St Flour 
summer school. To appear. Available at 

http : / / stat-www . berkeley . edu/user s/pitman/621 . ps . Z 

[19] Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution 
derived from a stable subordinator. Ann. Probab. 25 , 855-900 

[20] Revuz, D. and Yor, M. (1999). Continuous martingales and Brownian motion, Third 
edition. Springer- Verlag, Berlin 




308 



Jean Bertoin and Christina Goldschmidt 



Jean Bertoin 

Laboratoire de Probabilites et Modeles Aleatoires and Institut universitaire de 
Prance, 

Universite Pierre et Marie Curie, 

175, rue du Chevaleret, 

F-75013 Paris, 

France. 

Christina Goldschmidt 

Laboratoire de Probabilites et Modeles Aleatoires, 

Universite Pierre et Marie Curie, 

175, rue du Chevaleret, 

F-75013 Paris, 

Prance. 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Semi-Markov Walks in Queueing and Risk 
Theory 

My kola S. Bratiychuk 



!• Introduction 

Let {rjnii)}, {ctn(0}5 i = 1, 2, n = 1, 2 . . . be four sequences of non-negative totally 
independent random variables having the same distribution within every sequence. 
Let = 1, 2 be the renewal processes generated by o;n(i), i = 1, 2 respectively. 

The process 

AXl(t) 

eW=5]%(l)-5]7„(2)+^(0) (1) 

k=l k=l 

is called the semi-Markov random walk. 

Processes from (1) one often meets with in queueing theory, dam theory, risk 
theory due to the fact that such processes are the natural mathematical models 
of the problems arising there. The analytical theory of such walks was developed 
in [1], [2]. Here we apply it to the study of batch arrival queueing systems and to 
one class of non-Markovian risk processes. 

We put 

fi{s) = P{t?i(1) = k}= Pk, q(z) = 



2. Results 

Let ^{t) be the number of customers in the batch arrival system of the type 
G'^I/G/1 at the time t. 

Theorem 1. For m>l,\>0the following representation holds true 
r -.ml 



-|-0 oo 



+ j j e-^^&{t-y,m-k,X)dH{t,k)dP4y,X), 

’ ^=0-oo 0 



where ©(y, fc. A), H{y,k), P-{y^ X) are some known functions. 

Let now we have a risk process of the form 




310 



Mykola S. Bratiychuk 



and r = inf{t > 0 : ^{t) < 0}. Then ^(n) = P{r < oo/i^(0) = n} is ruin 
probability, provided the initial capital of the company is equal to n. 

Theorem 2. Let there exist sq < 0 such that f 2 {-so)q{fi{so)) > 1. Then for some 
e > 0 

$(n) = + o(/r"(-^ - £)) 

as n ^ oo, where K{v) is some known constant and sq < —u < 0 is the solution 
of the equation 

f 2 {-s)q{fi{s)) = 1. 



References 

[1] Korolyuk, V.S. and Pirliev B. Random walk on the half-axis on the superposition of 
two renewal processes^ Ukr. Matem. Zur. 36, N.4, 1984, 433-436. 

[2] Bratiychuk M. S., Kempa W. Application of the superposition of renewal processes to 
the study of batch arrival queues, Queueing systems. 44, 2003, 51-67. 

Mykola S. Bratiychuk 

Silesian University of Technology, Gliwice, Poland 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Representation of Fixed Points of a Smoothing 
Transformation 

Amke Caliebe 



ABSTRACT: Given a sequence of real random variables T = (Ti,T2,...), 
distributions fi which satisfy the following fixed point equation for distributions 
are considered: 

oo 

J=1 

where VF, W 2 ? ••• have distribution p and T, Wi, VF 2 , ... are independent. Only 
the case l>o ^ almost surely is regarded in this article. These solutions 

are known to arise ^ e.g., in the limiting behaviour of branching processes. Here, 
such fixed points are characterized as mixtures of infinitely divisible distributions. 
Depending on the properties of T and of the fixed points in question, it can be 
shown that the corresponding infinitely divisible distributions can be normal, a- 
stable (a < 1) or degenerate. 

1. Introduction 

Let T = (Ti, T 2 , ...) be a sequence of real random variables with N := ^|Tj |>o 
< 00 almost surely. The smoothing transformation K is defined as 

00 

K:V^V- K{f,) = '£TjWj 

with T> the space of distributions and Wi,W 2 ,... random variables with distri- 
bution p and T, Wi, W 25 ••• independent. The random variables Tj are also called 
coefficients. 

The object of this article is to characterize fixed points of K. These satisfy 
the following equation for distributions p 

00 

W = ( 1 ) 

where W,Wi,W 2 , have distribution p and T, ITi, IT 2 , ... are independent. A ran- 
dom variable is called a fixed point if its distribution satisfies (1). 

Fixed point equation (1) has extensive applications (for an overview see [26], 
[27], and [32]). Here, only the field of branching processes will be shortly (and 
not comprehensively) mentioned. Since the first appearance of fixed points in the 
Galton- Watson process to characterize the limiting behaviour of the generation 
size [4, 10, 18, 21, 23], they were also discovered in Bellman-Harris and Crump- 
Mode- Jagers processes [3, 11, 15, 16]. They are of special interest in the case 

* Research supported by the German Science Foundation (DFG) Grant RO 498/4-1. 




312 



Amke Caliebe 



of the branching random walk [5, 6, 7, 8, 9, 30] and of the weighted branching 
process ([28], who used the expression marked trees instead; [33], [34]). Lately, 
it was shown that fixed points with finite expectation can always be viewed as 
limiting distributions of the corresponding weighted branching process [14, 25]. A 
nice recent survey about distributional fixed point equations is given in [1] . 

Fixed points of Eq. (1) are sometimes regarded as generalized stable distribu- 
tions. [29] calls them laws stable by random weighted mean. If the coefficients Tj are 
constant then it is known that all fixed points are mixtures of stable distributions 
(if the closed multiplicative group generated by the coefficients is uncountable; 
otherwise the situation is slightly more involved) [2] . In the far more complex situ- 
ation of random coefficients, this result is unknown. In this article, however, some 
steps are made in that direction. Recently [12], it was proved that fixed points 
are mixtures of infinitely divisible distributions. Here, for special cases of fixed 
points these distributions are calculated in more detail (under certain regularity 
and integrability assumptions): 

• For positive coefficients and positive fixed points the infinitely divisible 
distributions are o-stable with 0 < a < 1. 

• For fixed points with finite, non-zero expectation and positive coefficients 
the infinitely divisible distributions are constants. 

• For fixed points with finite variance and zero expectation the infinitely 
divisible distributions are normal. 

Note that all these distributions are stable. Whether all kinds of fixed points are 
mixtures of o-stable distributions remains still to be investigated. 



2. Infinitely Divisible and Stable Distributions 

This section displays some basic important features of infinitely divisible and stable 
distributions as needed in the following. It mainly aims at establishing a consistent 
notation (cp., e.g., [22]). The experienced reader may skip this section or consult it 
at convenience. For the results and proofs of this section compare the fundamental 
works of [19], [24], and [31]; for the representation of stable distributions see also 
[ 22 ]. 

A probability measure P on M with characteristic function (fp or a random 
variable with distribution P is called infinitely divisible iff for every n G N there is 
a characteristic function ifn with ipp = ((^n)^- 

Let !Bo denote the Borel a algebra on R \ {0}. The function : R — > C is 
a characteristic function of an infinitely divisible distribution if and only if there 
exist constants 7 G R, G R>o and a (not necessarily finite) measure ly on ®o 
with 

/ oi'idu) < 00 

iR\{0}l + «2 

such that 

log ip{t) = ijt - + f (e‘“* - 1 - i/(du) . (2) 

This representation is unique. It is called Levy representation and u is referred 
to as the Levy measure. Denote in the subsequent text the measure generating 




Representation of Fixed Points 



313 



function of v by 






I/((-00,u]) 

-v{[u,co)) 



u < 0 
u> 0 



The stable distributions are an important subclass of the class of infinitely 
divisible distributions. A distribution or random variable with distribution function 
F is called stable iff for any ai, a2, 6i, &2 ^ Cii > 0, 02 > 0, there exist a, 6 G M, 
a > 0, such that the equality F{aix + 61) * F{a 2 X + 62) = F{ax + b) is valid (* 
denotes the convolution). 

Let X be a stable random variable. Then its characteristic function ipx has 
the representation 



log<^x(0 = 17^ - c\t\°‘ (1 - i/3sgn(t)o;(i,a)) (i 6 
with parameters a £ (0, 2], /3 e [-1, 1], 7 G K, and c G E>o, and 



(3) 



(t a)- i tan(iaTr) : a^\ 



The relation to the parameters of the Levy representation of infinitely divis- 
ible distributions (2) is the following: 

— 0 for a G (0, 2), 

rO „ .. foo „ ;ae(0,l) 



7=< 



and 



7 - aci - ac 2 

7 + (i- rftT)d« + 0C2/„”° ^ 

7 



a = l 

aG(l,2) 
a = 2 

(4) 



«<”)={ '"'““‘'‘'“o : a = 2 



[i<0 



C2l„>o) : aG(0,2) 



Cl and C2 are real non-negative numbers with ci + C2 > 0, ^ = cT+cJ"’ 

Q. G (0, 1) 
a = 1 
ae(l, 2 ) 

a = 2 



c= < 



-(ci + C2 ) cos( 7 tq;/ 2 ) /q°° {e~y - 1) y~'^~^dy 
(ci + C2)7 t/2 

-(ci + C 2 ) cos(7ro;/2) {e~y -1 + y) y~°^~^dy 
cr^/2 



(5) 



( 6 ) 



By the canonical representation (3) it can be seen that stable distributions 
admit a Lebesgue density. 



3. General Representation Result 

In this section the general representation result from [12] is given which states that 
fixed points are mixtures of infinitely divisible distributions. First the correspond- 
ing weighted branching process is introduced: 

Let y := y N" N° {0} 

nGZ+ 

and let T(?;) : Q v G V, denote independent random variables with 

T{v) = T. For V = {viyV 2 , € V let (for fc G N, fc < n) := {vi,...,Vk) 




314 



Amke Caliebe 



be the restriction to the first k components and v\q := 0. On the tree define the 
random variables or weights L{v) : fi — > R, G F, for 5 by 

n 

Let L := {L{v))^^y. Using recursion fixed point equation (1) can be expressed for 
n G N as 

W=Y^ L{v)W{v) , (7) 

\v\=n 

where W{v), |?;| = n, are copies of W and L, W{v), 1^;! = n, are independent. 

Let lU be a solution of fixed point equation ( 1 ) and define for 
Z G M := {Z G R^ : #{u G U : |r;| = n, l{v) ^ 0} < oo for all n G N} the random 
variable 

F„(0 := l{v)W{v) , (8) 

\v\=n 

with W{v)^ |t;| = n, copies of W and L, W{v), |t>| = n, independent. 

Theorem 3.1. (Representation; Theorem 1 of [12] J 

As n-^ oo assume sup|^|^^ l^('^)| 0 to hold almost surely. Let W be a solution 

of fixed point equation (1). For almost every I G R^ with respect to there exists 
an infinitely divisible random variable Y{1) with 

y^z)^y(z). 

Denote by p{l) the characteristic function ofY{l) and let 7 (Z) and cr^{l) be the 
corresponding constants and Q{1) the corresponding measure generating function 
of the Levy representation (2) of p[l) (see Section 2). Then {t G R) 

Tw{t) = Ep{L)(t) 

= Eexp ('y{L)t - + J - 1 - dQ(i)(w)^ . (9) 

The subsequent lemma shows how to compute Q, and 7 from the distri- 
bution function Fw of a certain fixed point W. Note that only the tail behaviour 
of Fw is needed. 



Lemma 3.2. (Lemma 6 of [12]) As n ^ 00 let sup|^l^^ l^('^)l ^ 0 ^oZd almost 
surely and assume that W is a fixed point of (1). Let Q{1), cr^{l) and 7 (Z) for 
almost every I G R^ with respect to be as defined in Theorem 3.1. Then for all 
continuity points u of Q{1) 



Q{l){u) = { 



lim y F{l{v)W < u) 

n-^00 

\v\—n 



lim 

n-^00 



E 



F{l{v)W <u)-l 



u <0 



u > 0 



( 10 ) 



lim lim inf 

€—>•0 n 




|v|=n 



\u\<e/\l{v)\ 



■dFw{u)-l'^{ 



'’(i 



\<^mv)\ 



u diFw {u) 




Representation of Fixed Points 



315 



= lim lim sup 



V ( l\v) [ dFw{u) - iHv) ( I udFwiu) 



|v|=n 

= a\l) 



and 



( 11 ) 



7( t )(0 = lim V l{v) [ udFw{u) , (12) 

J\u\<r/\l{v)\ 

where r and —r are continuity points of Q. 'j{l) can be obtained by 

7 (/) = 7(r)(/) - / ^,dQ{l){u) + / dQ{l){u) . (13) 

Vm<t1+« J\u\>T^d-U^ 



4. Positive Solutions 

In this section non-negative solutions of fixed point equation (1) are considered. 
It is always assumed that Tj > 0 for all j G N. Furthermore, it is supposed that 

P(iV = 0 or 1) < 1 and P(V j G N : Tj = 1 or Tj = 0) < 1 

hold. Otherwise, the situation is easy (see Lemma 1.1 of [27]. Often, a stronger 
assumption is needed: 

Assumption 1. 

There exists a S > 0 such that ^ ^ < oo. 

In the investigation of fixed point equation (1) the following function is of 
major importance (stated here for not necessarily non-negative coefiicients) 

oo 

m : R>o M>o; 7 \Tj\^'^\Tj\>o • 

It is easy to show that m is strict convex on m~^ (M>o)- In particular, there is 
at most one a G M>o such that m{a) = 1 and m'{a) < 0 (e.g., [12, Lemma 1]) 
Throughout the text it is always assumed that m(0) > 1. If m(0) > 1 does not 
hold the situation is trivial ([13, Lemma 2]). Denote 

? = {// G 2) \ {<5o} • A^([0, oo)) = 1 and is a fixed point of (1)}. 

If VF is a random variable then the somewhat sloppy notation W G ? means that 
the distribution of W belongs to T. 

The fixed point equation (1) is called non-lattice if there is no s > 0 such 

that 

P(Vj G N : logTj G sZ for Tj > 0) = 1 . 

4.1. Existence 

Results regarding the existence and uniqueness of non-negative solutions of fixed 
point equation (1), under certain integrability condition on the Tj, were given in 
Theorem 1 in [17] for constant AT, and Theorem 1.1 and Corollary 1.5 in [27]: 

Theorem 4.1. (Theorem 1.1 and Corollary 1.5 of [27]) 

(i) Let inf m(f3) < 1. Then 7 //E T- log"^ Tj < oo and EN < oo 

the converse is also true. 




316 



Amke Caliebe 



(ii) Let Assumption 1 hold and assume T 7^ 0. According to (i) let a he the 
unique point of{0^ 1] such thatm{a) = 1, m\a) < 0. If fixed point equation 
(1) is non-lattice or a = 1 then (1) has only one non-negative solution up 
to multiplicative constants. 

U W E 7 and Assumption 1 holds we denote by a always the unique point 
in (0, 1] such that m(a) = 1, m'(a) < 0. 

The following is known regarding the tail behaviour ([17, Theorem 2], for 
constant N; [27, Corollary 1.6]): 

Theorem 4.2. (Corollary 1.6 of (17]) Let Assumption 1 hold and assume that fixed 
point equation (1) is non-lattice. Let the random variable W E have distribution 
function F. Assume a < 1. 

(i) Ifm'{a) < 0 then there is a constant Ci > 0 such that 

1 — F{u) Ciu~^ for u 00 . 

(ii) If m\a) = 0 then there is a constant C 2 > 0 such that 

l~F{u) C 2 U ^ log u for u 00 . 

4.2. The Representation 

Let 

Tn : V eV,\v\ < n} . (14) 

Define for 5 > 0 the operator 

00 

Ksi^i) = Y, TfXj , (15) 

i=i 

where Xi, A2, ... have distribution /i and T, Xi,X2, ... are independent. 

That y{l) of Theorem 3.1 is a-stable in the context of positive fixed points 
and coefficients is shown in the next theorem. This is the main new result of 
this article. Thus, it is derived that each fixed point is a mixture of a-stable 
distributions. Associated results are given in the proof of Theorem 3.1 in [17], 
in Section II, Proposition 1 in [20], in Section 5 in [27], and in Theorem 2.5 of 
[28] . The proof of this theorem and of its corollary, which shows the existence of 
Lebesgue densities, is given in the appendix. 

Theorem 4.3. Let Assumption 1 hold and assume that fixed point equation (1) 
is non-lattice. Let T 7^ 0 and W be the (up to multiplicative constants) unique 
non-trivial solution of (1). Assume a < 1. If C\ and C 2 are chosen according to 
Theorem 4-2(i) and (ii), then (Q, , 7 as in the representation of Theorem 3.1): 

(i) Case m'{a) < 0; Yl\v\=n ^ non-negative martingale with respect 

to {3^n)neN converges almost surely and in L\ to a non-negative, L- 
measurable random variable Zi with EZ\ — \. Z\ is the (up to mul- 
tiplicative constants) unique non-trivial, non-negative fixed point of Kq^. 
Furthermore, almost surely 

<3(^)(«) = I : u>0 ’ 

a‘^{L) = 0 , 

7T 



7(i) = 



CiCnZi 




Representation of Fixed Points 



317 



(ii) Case m\a) = 0; ^^^^^^L^{v)logL{v) is a martingale with respect to 
(3"n)n€N expectation 0 and converges almost surely to a non-positive, 
L-measurable random variable Z 2 with P(Z 2 < 0) > 0. Furthermore, al- 
most surely 

QiL){u) ^ I 



: < 0 
: u> 0 



ctHL) = 0, 



7(L) = - 




C2OLZ2 . 



(iii) For almost every I G with respect to P^ the random variable Y{1) of 
Theorem 3.1 is a-stable with P^ (Y{1) is non- degenerate) > 0, i.e. (pw is 
the integral over characteristic functions of a-stable random variables. In 
particular: 

Case m\a) < 0; (pw{t) = Eexp [—NiZi\t\^ (l — isgn(t) tan (^))) 
with Ni = -Cl cos(7ra/2) (e“^ — 1) y~^~^ dy. 

Case m'{a) = 0; p>w(f) = Eexp {—N 2 Z 2 \t\^ (l — isgn{t) tan (^))) 
with N 2 = C 2 cos(7ra/2) (e“^ — 1) y~^~^ dy. 



Let g : [0,1] [0,1]; t “ j) generating function of 

N. Note that, due to Lemma 3.1 of Liu (1998), the existence of a non-trivial non- 
negative solution IF of (1) ensures the existence of a unique fixed point g of ^ in 
[0, 1) and P{W = 0)=q. 



Corollary 4.4. Under the assumptions of the preceding theorem, W has a Lebesgue 
density / on M \ {0}. In particular: If fa,/3,j,c denotes the Lebesgue density of a 
stable distribution with corresponding parameters (see (3)) then, almost surely with 
respect to the Lebesgue measure, 

(i) Case m'{a) < 0; / : R \ {0} M>o; t ^ E/a,i,MiZi,JViZi (t)lzi/o. 

(ii) Case m'{a) = 0: f :R \ {0} M>o; t E/c,i,M 2 Z 2 ,iV 2 ^ 2 (^)1^2^o* 

Remark 4.1. Theorem 2.3 of Liu (2001) guarantees the existence of a Lebesgue 
density ifETF^lr<oo < 00 for r := min{j G N : Tj > 0} for some e > 0, also for 
the lattice case and a = 1. 



5. Solutions with Finite Expectation 

In this section we always assume that the coefficients Tj are non-negative. 

We denote by £ the set of solutions of fixed-point equation (1) with finite 
non-zero expectation. 

5.1. Existence 

The next theorem is a direct combination of Corollary 2.1 and Remark 2.1 of [14]. 
The proof of this corollary applies the result of [30]. The theorem gives necessary 
and sufficient conditions for the existence of solutions with finite non-zero expec- 
tation. It can also be generalized to include the case F{N < 00 ) < 1 (see Corollary 
2.1 of [14]. 




318 



Amke Caliebe 



Theorem 5.1. (Cor. 2.1 and Rem. 2.1 of [lA]) Suppose that t 

exists and is finite. Assume ETj < oo for all j € N. Then, 




5.2. The Representation 

For fixed points W E E the random variables Y{1) of Theorem 3.1 are degenerate, 
i.e. constant. 

Theorem 5.2. Assume that ETj < oo for all j E N and that £ 7 ^ 0 with W e 8,. 

(i) m(l) = 1 and T(i;) is a non-negative martingale with respect to 

{3^n)neN (H)) converges almost surely and in L\ to a non- 

negative, L-measurable random variable Z with EZ = 1. Furthermore, 
in the representation of Theorem 3.1, the following statements are true 
almost surely: 

Q{L) = 0, 

ct^{L) = 0 , 

7 (L) = ZEW . 

(ii) For almost every I G with respect to the random variable Y{1) of 
Theorem 3.1 is a constant. 

Since 7 (L) = ZEW for IP G £, Theorem 5.2 also shows the uniqueness of 
solutions (up to multiplicative constants). 

Proof: Assume ETj < 00 for all j G N and that £ 7 ^ 0 with W E £. That 
m(l) = 1 and Xl|t;|=n ^ non-negative martingale with respect to 

and converges almost surely to a non- negative, L-measurable random variable Z 
with 0 < EZ < 00 , was proved in Theorem 2,2 and Proposition 2.1 of [14]. Because 
W has finite expectation, certain statements about the tail behaviour are known. 
These can be used to calculate Q, and 7 directly from Eqs. (10) - (13). A very 
similar procedure was performed in detail in part (vi) in the proof of Theorem 2.3 
in [14]. □ 

6. Solutions with Finite Variance 

In this section it is assumed that = 1) < 1. This is not necessary 

(Theorem 18 of [13]). However, it clarifies the situation and avoids the distinction 
between several cases. In this section the assumption — 0 of Sections 4 and 5 
is not necessary. 

Denote by resp. 3^, the set of non-trivial solutions of fixed point equation 
(1) with finite variance and non-zero, resp. zero, expectation. These cases have to 
be treated separately. For set the subsequent Assumption 2 and for set 3^ 
Assumption 3 is necessary. 

Assumption 2. EY^^iTf < 00 and converges in L 2 . 

\ /n€N 

00 

Assumption 3. E^l^i Tj < 00 and E ^ Tj logT^^ < 00 . 




Representation of Fixed Points 



319 



6.1. Existence 

Sufficient and necessary conditions for the existence of solutions with finite variance 
are given in the next theorem. This is a result of [13]. It does not require that 
P{N <oo) = l. 

Theorem 6.1. (Theorem 3 of [13] J 

(i) Suppose that Assumption 2 holds. Then 

oo oo 

5 - 1 ^ 0 ^ <1 andE'^Tj = 1 . 

j=l j=l 

(ii) Suppose that Assumption 3 holds. Then 



7^0 

OO OO 

Ej^Tj = 1,E <0 andE 



log- 

j = l 




< OO. 



(hi) Suppose that Assumption 2 or 3 holds. ^0, the solution of Eq. 

(1) with finite variance is unique up to multiplicative constants. 



6.2. The Representation 

The subsequent theorem shows the representation in the case 9^. (For IF G 
the results for finite expectation apply since EW ^ 0.) Fixed points with finite 
variance and expectation 0 are mixtures of normal distributions with expectation 
0 and a random variance which satisfies a related fixed point equation. Recall the 
definition of Ks from Eq. (15). 

Theorem 6.2. Suppose that Assumption 3 holds. Let 3^/0. 

(i) -^/^(^) is a non-negative martingale with respect to 

converges a.s. and in L\ to a non-negative, L-measurable random variable 
Z with EZ = 1. Z is the (up to multiplicative constants) unique non-trivial 
fixed point of with finite expectation. Furthermore, in the representation 
of Theorem 3.1, the following statements are true almost surely: 

Q{L) = 0, 

a'^{L) = ZYsxW , 

7(L) = 0 . 

(ii) For almost every I £ with respect to the random variable Y{1) of 
Theorem 3.1 is normal distributed, i.e. (fw is the integral over character- 
istic functions of normal random variables. In particular: 

ifw{t) = Eexp (-if^^VarVp) {teR). (16) 

Proof: The proof is similar to that of Theorem 5.2. Instead of part (vi) in the proof 
of Theorem 2.3 in [14], part (vi) of Lemma 16 of [13] is employed. □ 

Acknowledgements 

The author would like to thank Uwe Rosier and Gerold Alsmeyer for stimulating 
conversations. She is also grateful to two anonymous referees for a careful reading 
of the manuscript. 




320 



Amke Caliebe 



Appendix A. Appendix: Proof of Theorem 4.3 

Before performing the proof of Theorem 4.3, two preliminary results are stated. 
Assume Tj > 0 for all j € N. For the proof of the first lemma compare Lemma 3.2 
in [14]. It was also proved in Lemma 7.2 of [27] and implicitly in Lemma 2.2 of [8] 
applying a result of [6]. 

Lemma A.l. Suppose that there exists an a G (0, oo) such that m{a) — 1 and a 
5 > 0 such that m{a + 5) < oo. Then, as n oo, 

sup L{v) — ^ 0 almost surely. 

\v\=n 

In particular, the result holds if 7 and Assumption 1 is satisfied. 

Corollary A.2. Suppose that there exists an a G (0,oo) such that m{a) = 1. Then, 
for every l3 > a, such that there exists a 5 > 0 such that m(a/{f3 — a) + S) < oo, 
as n ^ oo, 

L^{v) — 0 almost surely. 

\v\—n 

Proof: Choose a and f3 according to the statement of the corollary. We consider 
the function 

oo 

m:R>o-l>o; ^ ^ ■ 

J = 1 

Then we know that m = 1. Therefore, by the preceding lemma as n — > oo, 

sup > 0 almost surely. 

|i>|=n 

Furthermore, is a non- negative martingale and converges almost 

surely. Thus, 

E ^ sup ^{v) ^ 0 almost surely as n — > oo .□ 

\v\=n \v\—n \v\=n 

Proof of Theorem 4.3: 

Let Assumption 1 hold. Let W e 9, a < 1 and fixed point equation (1) be non- 
lattice. Choose Cl, C 2 according to Theorem 4.2(i) and (ii). 

(a) Martingales and Calculation of Q 

Case m'(a) < 0: being a non-negative martingale follows directly 

thereby guaranteeing the almost sure convergence. Consider the function 

00 

m : R>o ^ M>o; 7 

corresponding to the operator Aq,. Note that m { j ) = 771(07), ^( 1 ) = 1 and 
m'(l) = am' {a) < 0. Therefore, lim^_,oo !C|i;|=n is a non-trivial fixed point 

of Ka with mean 1 [30]. Denote by F the distribution function of W and let u> 0. 
By Lemma 3.2 it follows that P^-almost surely for I e 





Representation of Fixed Points 



321 



Let 5 > 0, 5 < Cl- Choose by Theorem 4.2(i) a. xq e M>o such that for every 

X > Xq 

x~^{Ci - (5) < (1 - F{x)) < x~^{Ci + 5) . 

Due to Lemma A.l it is possible to choose uq := no{l) G N such that u/l{v) > xq 
for each v gV, |?;| > no (P^-almost surely). Then for every n > no 

-(Cl ^ r{v) <J2 (f(^)-i)< -{C^-5)U- nv)- 

\v\=n \v\=n ^ V V ;/ / 1^1^^ 

Thus 

-{CiF5)u~^ lim y r(n) < Q{l){u) < -(Ci - lim V r(v). 

n-^oo ' n— >oo ^ 

\v\=n \v\=n 

Letting 5 — > 0 we obtain P^-almost surely that 

Q{l){u) = — Cin“" lim y /"(n) 

n—*oo ^ 

|i;|=n 

therefore deriving the desired equality. 

Case m'(o;) = 0: It can be proved directly that logL(n) is a mar- 
tingale. Note that T" = 1 and logTj = 0 since m{a) = 1 and 

m'(a)=0. The remaining statements are gained analogously to the case m'{a) < 0. 

□ 



(b) Calculation of 

Case m'{a) < 0: As in part (a) let 5 > 0, 5 < Ci and choose xq gR such that for 
every x > Xq 

x~^{Ci -6)<{1- F{x)) < x~^{Ci + 5) . 

Then let e > 0 and choose no := no(/) G N such that e/l{v) > xq for each v gV, 
|n| > no (P^-almost surely). By partial integration we obtain for each v G V, 

ki > ^0, 

pe/l{v) 

/ dF{u) 

Jo 

n / € \ r^/Ky) 

= e^l~‘^{v)F yj^j—2J uF{u)du-\-2J u{l - F{u)) du - 2 J udu 

< r2+“(t;)e2-“ ^_^(Ci+5)-(Ci -(5)^ +xl-2j\F{u)du 

-^(Ci + <5)x^“. 

Applying Eq. (11) gives 



E pe/l{v) 

i^(y) / dF{u) 

- n— >oo , I Jq 



< limlimsupl “ ( (Ci + 5) - (Ci - 5) ) Y] /"(t 

e—O „^oo V \2-a / , , 

\v\=n 




322 



Amke Caliebe 



+ faro - 2 [ “) X] 

\ Jo ^ “ / |,;|=„ 

Corollary A. 2 guarantees that the last sum goes to 0 for n — > oo (P-^-almost 
surely). Since converges for n oo, this results in = 0 P^- 

almost surely. 

In the case m'(a) = 0 the arguments are similar. Note that now m'(l) = 0 and 
therefore limn-^oo — 0 almost surely by [30]. □ 

(c) Calculation of 7 

Case m'{a) < 0: Let r G E>o. With the same reasoning as in part (b) we obtain 
for each J > 0 almost surely 

7(r)(L)<Ti-“f^(Ci+5)-(Ci-(5)) lim , 

V 1 — a / n-^00 ^ 

^ |i»|=n 

7(r)(L) > f ^(Ci - J) - (Ci + S)) lim ^ L^{v) . 

\ 1 — a J n-^00 

|v|=n 

For (5 ^ 0 we obtain (Zi := lim„_oo 2|t,|=„ 

7(t)(L) = . 

1 — a 

By using the representation of Q{L) in part (a) and Eq. (13), we derive almost 
surely 

O r r°° \ 

7T 

r ^ 0 results in ^(L) = Cio;Zi / du = C\aZi 7- 

^0 1 + ^" 2sin(i^) 

The proof of case m'(a) = 0 is similar. Observe that if lim^_^oo 
\ogL{v) = 0 almost surely, the solution W would be trivial contrary to the as- 
sumption. □ 

(d) Proof of (iii) 

Parts (a), (b) and (c) together with with Eqs. (3) - (6) show that Y{1) has P^- 
almost surely a stable distribution and a{L) = a, (3{L) = 1, ci(L) = 0 and 
7(L) = 0. Furthermore, c{L) = —C2{L)cos{7ra/2) f^{e~y — l)y~°''~^dy with 
C2(L) = CiZi if m'{a) < 0 and C 2 {L) = C 2 Z 2 if m\a) - O.D 

Proof of Corollary 4.4: 

Let Assumption 1 hold and assume that fixed point equation (1) is non-lattice. Let 
T 0 and W be the (up to multiplicative constants) unique non-trivial solution 
of (1). Assume a < 1. 

Case (i): From Theorem 4.3 (i) it follows that 

^w{t) = P(^1 = 0) + Eexp (iMiZit - NiZi\t\^ (l - isgn(f)tan (^))) lzi>o 
= P(Zi =0 ) + Je(/,,i , Ml Zi , iVi Zi ('i^) 1 Zi > 0 ) exp(i'ut) du. 





Representation of Fixed Points 



323 



The right hand side is the characteristic function of a distribution with point mass 
P(Zi = 0) at 0 and Lebesgue density / on M\{0} as stated in the lemma. Applying 
the uniqueness of the characteristic function completes the proof. 

Case (ii): The proof is along the same lines as the proof of case (i).D 



References 

[1] Aldous, D. J. and Bandyopadhyay, a. (2004). A survey of max-type recursive 
distributional equations. Preprint, arXivimath. PR/0401388. 

[2] Alsmeyer, G. and Roesler, U. (2003). A stochastic fixed-point equation related 
to weighted branching with deterministic weights. Bericht 10/03-S Angewandte Ma- 
thematik, FB 10, University of Munster, 

http: / / wwwmath.uni-muenster.de/inst/Statistik/ alsmeyer /Publikat ionen.html. 

[3] Athreya, K. B. and Ney, P. E. (1972). Branching Processes. Springer, Berlin. 

[4] Athreya, K. B. (1971). On the absolute continuity of the limit random variable 
in the supercritical Galton- Watson branching process. Proc. Amer. Math. Soc. 30 , 
563-565. 

[5] Biggins, J. D. (1977a). Martingale convergence in the branching random walk. J. 
Appl. Prob. 14 , 25-37. 

[6] Biggins, J. D. (1977b). Chernoff’s theorem in the branching random walk. J. Appl. 
Prob. 14 , 630-636. 

[7] Biggins, J. D. and Grey, D. R. (1979). Continuity of limit random variables in 
the branching random walk. J. Appl. Prob. 16 , 740-749. 

[8] Biggins, J. D. and Kyprianou, A. E. (1997). Seneta-Heyde norming in the branch- 
ing random walk. Ann. Probab. 25, 337-360. 

[9] Biggins, J. D. and Kyprianou, A. E. (2004). Fixed points of the smoothing trans- 
form; the boundary case. Preprint, http://www.shef.ac.uk/'^stljdb/tsttbc.html. 

[10] Bingham, N. H. and Doney, R. A. (1974). Asymptotic properties of supercritical 
branching processes I: The Galton- Watson process. Adv. Appl. Prob. 6, 711-731. 

[11] Bingham, N. H. and Doney, R. A. (1975). Asymptotic properties of supercritical 
branching processes II: Crump-Mode and Jirina processes. Adv. Appl. Prob. 7 , 66-82. 

[12] Caliebe, a. (2003). Symmetric fixed points of a smoothing transformation. Adv. 
Appl. Prob. 35 , 377-394. 

[13] Caliebe, A. and Rosler, U. (2003). Fixed points with finite variance of a smooth- 
ing transformation. Stochastic Process. Appl. 107 , 105-129. 

[14] Caliebe, A. and Rosler, U. (2004). Fixed points of a smoothing transformation 
with finite expectation: closing a gap. Preprint, 

http: / /www-computerlabor. math.uni-kiel.de/stochastik/caliebe 

[15] Crump, K. and Mode, C. J. (1968). A general age-dependent branching process 

(I) . J. Math. Anal. Appl. 24 , 497-508. 

[16] Crump, K. and Mode, C. J. (1969). A general age-dependent branching process 

(II) . J. Math. Anal. Appl. 25, 8-17. 

[17] Durrett, R. and Liggett, T. (1983). Fixed points of the smoothing transforma- 
tion. Z. Wahrscheinlichkeitsth. 64, 275-301. 

[18] Geiger, J. (2000). A new proof of Yaglom’s exponential limit law. In Algorithms, 
Trees, Combinatorics and Probability, eds. D. Gardy and A. Mokkadem. Trends in 
Mathematics, Birkhauser, Basel, pp. 245-249. 

[19] Gnedenko, B. V. and Kolmogorov, A. N. (1968). Limit Distributions for Sums 
of Independent Random Variables. Addison- Wesley, Reading. 

[20] Guivarc’h, Y. (1990). Sur une extension de la notion de loi semi-stable. Ann. Inst. 
Henri Poincare - Probabilites et Statistiques 26 , 261-285. 




324 



Amke Caliebe 



[21] Harris, T. E. (1948). Branching processes. Ann. Math. Stat. 19, 474-494. 

[22] Hall, P. (1981). A comedy of errors: the canonical form for a stable characteristic 
function. Bull. London Math. Soc. 13 , 23-27. 

[23] Heyde, C. C. (1970). Extension of a result of Seneta for the super-critical Galton- 
Watson process. Ann. Math. Stat. 41 , 739-742. 

[24] Ibragimov, I. A. and Linnik, Yu. V. (1971). Independent and Stationary Sequences 
of Random Variables. Wolters-Noordhoff, Groningen. 

[25] Iksanov, a. M. (2004). Elementary fixed points of the BRW smoothing transforms 
with infinite number of summands. Preprint. 

[26] Liu, Q. (1997). Sur une equation fonctionnelle et ses applications: une extension du 
theoreme de Kesten-Stigum concernant des processus de branchement. Adv. Appl. 
Prob. 29 , 353-373. 

[27] Liu, Q. (1998). Fixed points of a generalized smoothing transformation and appli- 
cations to the branching random walk. Adv. Appl. Prob. 30 , 85-112. 

[28] Liu, Q. (2000). On generalized multiplicative cascades. Stochastic Process. Appl. 86, 
263-286. 

[29] Liu, Q. (2001). Asymptotic properties and absolute continuity of laws stable by 
random weighted mean. Stochastic Process. Appl. 95 , 83-107. 

[30] Lyons, R. (1997). A simple path to Biggins’ martingale convergence for branching 
random walk. In Classical and Modern Branching Processes, eds. K.B. Athreya and 
P. Jagers. IMA Volumes in Math, and its Appl., vol. 84, Springer, Berlin, pp. 217- 
221. 

[31] Petrov, V. V. (1975). Sums of Independent Random Variables. Springer, Berlin. 

[32] Roesler, U. (1992). A fixed point theorem for distributions. Stochastic Process. 
Appl. 42 , 195-214. 

[33] Roesler, U. (1993). The weighted branching process. In Dynamics of complex and 
irregular systems (Bielefeld, 1991 ), Bielefeld Encounters in Mathematics and Physics 
VIII, World Science Publishing, River Edge, NJ, pp. 154-165. 

[34] Roesler, U., Topchii, V. and Vatutin, V. A. (2002). Convergence rate for stable 
Weighted Branching Processes. In Mathematics and Computer Science II, eds. B. 
Chauvin, P. Flajolet, A. Mokkadem. Trends in Mathematics, Birkhauser, Basel, pp. 
441-453. 

Amke Caliebe 

Institut fiir Medizinische Informatik und Statistik, Universitatsklinikum Schleswig- 

Holstein, Campus Kiel, Brunswiker Str. 10, D-24105 Kiel, Germany (corresponding 

address) and 

Mathematisches Seminar, Christ ian- Albrecht s-Universit at zu Kiel, Ludewig-Meyn- 
Str. 4, D-24098 Kiel, Germany 

caliebe@math.uni-kiel.de 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Stochastic Fixed Points for the Maximum 

Peter Jagers and Uwe Rosier 



ABSTRACT: We consider stochastic Sxed point equations 

X = sup TiXi 

i 

in X > 0 for known T = (Ti,T 2 , . . .). The rvs T,X{,i € N are independent and 
Xi distributed as X. We present a systematic approach in order to hnd solutions 
using the monotonicity of the corresponding operator. These equations come up 
in the natural setting of weighted trees with finite or countable many branches. 
Examples are in branching processes and the analysis of algorithms (for parallel 
computing). 

The above supremum equation is equivalent to 

F(t)^E]jF(t/T,) 

i 

for distribution functions. In case of a characteristic exponent a solutions are 
known via the Laplace or Fourier transform of solutions to the stochastic fixed 
point equation 

i 

We shall show, that there are more fixed point for the supremum equation than 
to the sum equation. 

The prominent example is the water cascade problem. We obtain a whole 
class of such solutions exploiting stochastic monotonicity. 



1. Introduction 

In the mathematical abstract setting we are interested in random variables X with 
values in [0, oo] satisfying a fixed point equation 

X = \jTiXi. ( 1 ) 

i 

The rvs T,Xi,i eN are independent and the positive Xi have the same distribution 
as X. V denotes the supremum. The distribution of T = (Ti,T 2 ,...) is known. 
Notice, that we require no dependence assumption on the factors 0 < Ti,i eN. 

The above equation is a fixed point equation for probability measures on the 
extended positive reals [0, oo]. Define the map K by 

K{p) := L{WiTiXi) 



( 2 ) 




326 



Peter Jagers and Uwe Rosier 



as above and Xi having distribution jjL on [0, oo]. Here L denotes the distribution 
of a rv. In terms of distribution functions F the operator K satisfies 



K{F){t):=EYlF(^Y 

i \ * / 



( 3 ) 



Notice, that dealing with the supremum in (1) is equivalent to dealing with the 
infimum using the formulation 









1 J_ 

F.Yi 



Taking logarithms the equation (1) is equivalent to 



( 4 ) 



lnX = yi{\nTi+\nXi) 



( 5 ) 



replacing the multiplicative group ((0, oo), •) by the additive (R, +). 

Besides the mathematical importance of the above fixed points in itself they 
appear naturally in the setting of probability theory on weighted trees. Consider 
a weighted branching process with positive factors, i.e., iid positive rvs T{v) = 
(Ti(i;),T 2 (t'), . . .), V G V = N* with values in and every vertex v of the tree 
V with countable many branches has a random weight (^length) L{v) recursively 
given by L{vi) = Ti{v)L{v), L(0) = 1. The largest weight Ln = sup|^|^^ L(t’) in 
the n— th generation satisfies in distribution the equation 



KiL{Ln)) = 



The easiest example are branching processes [2] with factors either 0 or 
1. L{v) = 1 corresponds to the particle v is alive and = 1 corresponds to at 
least one living particle in n— th generation. The distribution of Ln converges to 
the measure q6o + (1 — q)Si where 6x denotes the point measure on x and q is the 
extinction probability of the branching process. 

Another example are branching random walks, [3], the particles split like a 
branching process and have a displacement like a random walk on M. This is a 
weighted branching process on the positive reals, using the additive group instead 
of the multiplicative. The largest (=most right) particle has the position Ln on R. 

Other examples of (1) came up in the analysis of algorithms. Suppose parallel 
computing of a problem following a certain order of the partial tasks according 
to the tree structure. The result is available, if the last task, let us say in n— th 
generation, is done. For large n we obtain in the limit a solution of a fixed point 
equation involving the supremum. Prom that we may obtain estimates for the time 
spend by the algorithm. 

Many solutions of (3) are known in the setting of weighted branching pro- 
cesses, especially for positive coefficients. The stochastic fixed point equation 






( 6 ) 



in X is well studied. Taking Laplace transforms ip{t) = Ee or Fourier trans- 
forms for symmetric X we find 



ip{t) = 

i 



( 7 ) 



The function x i-^ is a fixed point of (3). 




Fixed points for the maximum 



327 



A crucial role for the existence of solutions of (6) plays the characteristic 
exponent a defined by = 1. We will not stress this connection, just give 

some literature. Durrett and Liggett [5], Liu [7], Lyons, Pemantle, Peres [8] on 
the existence of positive solutions, Caliebe [4] Rosier [13] for symmetric solutions. 
In a broader context, especially for stochastic algorithms, these equations were 
considered using the contraction method with various metrics on the space of 
measures, [12], [9], [11]. 

However the equation 1 has more solutions than those obtainable by equation 
(7) respectively (6). In section 4 and 5 we present in our main Theorem 5.1 a new 
class of solutions including cases without a characteristic exponent a. 

For the time being let us present a concrete example in branching processes, 
water cascading. Suppose every edge (respectively knot) of a binary tree gets a 
value 0 or 1 by independent, Ber(p) distributed rvs. Now we pour water into the 
root of the tree. 




If the edge has the value 0 the water passes immediately through and if 
the edge has the value 1 then the water has one unit time delay. (Otherwise the 
water has infinite speed.) At what time can we expect the first water in the n-th 
generation? And passes the water immediately down through all generations? 

In our additive formulation every knot gets as weight the sum of all the 0 and 1 
along the path to the root. The infimum Ln of the weights in the n-th generation 
corresponds to the time of the first arrival of water in the n-th generation. We 
obtain the recursive equation for the distribution 

+Ti)A(L2 +T 2 ) 

where L^,L^,Ti,T 2 are independent, have the same distribution as Ln 

and Ti , T 2 sire independent Bernoulli distributed to the parameter p as above. If 
2p is strictly smaller than 1 then with strict positive probability water will run 
immediately in all generations. This implies Ln converges pointwise to some rv L 
and P{L = 0) > 0. Otherwise the probability is 0 and Ln converges to infinity. 

Concerning the probability P{L = 0), the argument is easy by identifying 
the instantaneous appearance of water with the survival probability of a branching 
process. A 0 for the edge (u, vi) corresponds exactly to a living particle at ui, given 
a living particle at v. The reproduction distribution for the branching process is 
Po = 2p, p 2 = Pi = 2pq where g — 1 - p. This identifies the probability of 
instantaneous appearance of water as the survival probability of the branching 
process. However, the probability that the first water appears in all generations 
at least within time k is not so easy to calculate. We provide the result in the last 
section via a fixed point equation 

L={L^+ Ti) A (L^ + T 2 ) 

L^,L^,Ti,T 2 independent, L^ ^ Lp' L, Ti ~ T 2 ~ Ber{q). 




328 



Peter Jagers and Uwe Rosier 



The first section states the problem mathematically precise in the setting of 
weighted branching processes. Section 3 excludes some not so interesting cases. 
Section 4 provides in Theorem 4.1 the main monotonicity argument in general 
and Section 4 gives a more detailed study, Theorem 5.1, on fixed points under the 
additional restriction of all factors bounded by 1. The last section provides more 
detailed results especially to the water cascade problem. 

There are related recent papers prepared after finishing this work by Neininger 
and Riischendorf [6] and another one by Aldous and Bandyopadhyay [1]. 



2. Mathematical setting 

Let (fl , A, P) be a probability space sufficiently large for our purposes. We suppress 
a; G n whenever possible. Let (P, (G, •), T, L) be a weighted branching process 
[12] on the multiplicative group G = (0, oo). For simplicity we take the adjoint 
grave A (A • ^ ^ • A = A for all ^ G G U {A}) as the point 0. The set 

V := Un>o denotes the vertices of a tree with countable many branches and a 
root 0. T{v) = (Tiiy) ,T 2 {v) , . . .), u G P are independent and identical distributed 
random variables with values in ([0, oo))^. 

Define recursively L{v) : fi G U {A} == [0, oo), G F, by L(0) = 1 and 

L(vi)=Ti{v)L{v). 

Notice L{i) = T^(0). An equivalent definition is L{y) = YYj=i ^ vertex 

V = (vi,...,Vn) G of length \v\ = n e N. The symbol v\j := (t'l, ^ 2 ? • • • , 
denotes the restriction of v to the first j coordinates. 

Let Vn denote the set of vertices of length n. Throughout the paper we assume 
the random measure 

^ ^ ^L{v) 

veVr,, L{v)eG 

to be a Radon measure on G = (0,oo), taking finite values on all compact sets 
(relative to the induced topology). We use i/ := and u{l) := i/({l}), \\i/\\ := 
z/(G), which might be oo. Notice i/ is a random measure and also z^(l), ||z^|| is 
random. 

Define 

Ln := sup{L(^;) \v eVn} 

K ■= sup{^^ |vey„+i, ui = z} 

where is the intuitive writing for the well defined expression 
We shall use the intuitive writing whenever possible. 

The L-rvs satisfy the following recursive equation 

L„+i = V L{i)L\ (8) 

i€N 

as rvs. Notice L{i) = T(0),L^,z G N are independent and all L\ have the same 
distribution as the L„. The symbol V denotes the supremum. 




Fixed points for the maximum 



329 



Define the map K acting on probabilities measures on [0, oo] to probability 
measures on [0, oo] via 

K{fi)^L{\/TiXi), (9) 

ieN 

L denoting the distribution of a rv. The rv T = (Ti,T 2 , . . .) (on a not specified 
probability space) has the same distribution as the T{v) above. The rvs z e N, 
(on the same not specified probability space) have all the distribution ju. The rvs 

T,Xi,i eN, are independent. The symbol = denotes equality in distribution. We 
use the convention 0 • oo = 0 and x • oo = oo otherwise. 

Notice that (8) rewrites in the weaker form K{Si) is the distribution -C(Ti) 
of Li and in general K{L{Ln)) = 

On the space of probability measures introduce the stochastic order via 

H :<i' ^ j fdfjL < fdu 

for all positive increasing functions /. This order is equivalent to point- 

wise for the distribution functions. Actually the set of probability measures on 
[0, oo] with the stochastic order is a lattice and K acts on this space via a lattice 
operation. For simplicity we shall work with distribution functions instead of the 
lattice of measures. 

Let C be the set of all increasing, right continuous functions F : (0, oo) ^ 
[0, 1]. For notational reasons we use F{oo) = 1. There is a bijective map of prob- 
ability measures on [0, oo] and elements of 6. Therefore we call an element of C a 
distribution function. The map K acts, recall the bijection, on C to 6 

K{F){t) = El[F{h, (10) 

zGN * 

t E (0, (X)). The n-th iterate of K writes 

K^{F){t) = eY[F (;^) = 

veVn ^ 

If Fn denotes the distribution function of then K{Fn) — 

We have chosen here right continuous distribution functions. These are natu- 
rally more suitable for the supremum. Considering the infimum, the left continuous 
distribution functions are more suitable, since the corresponding map K (apply 
the map x e G to its inverse ^) 

K(F){t) = El[[F{tT,) 

ieN 

with F{t) := 1 — F{t—) := 1 — lim^/'t F{s). 

In our case of a Radon measure u we could actually take left or right versions 
of F, and the fixed point equation rewrites 

We sometimes use the operator K also for increasing left continuous functions, if 
appropriate. The discussion for left or right continuous functions would disappear 
using the lattice setting of the measures.) 




330 



Peter Jagers and Uwe Rosier 



Remark: We consider here the supremum of all L(v), v The argument 
for the infimum is the same using the map [0, oo] 3 x ^ G [0, oo]. Instead of 
Ti consider ^ for T{ > 0. Now the grave is oo and we would use the convention 
0 • 00 = 00 . The fixed point equation (1) rewrites to 



1 

X 




1 1 
TiYi' 



A translation of F G C with respect to a > 0 is the function Fa ^ G defined 
by Fa{x) = F{ax). 

Lemma 2.1. The set 6 is convex and closed under translations, finite sums, finite 
or countable products and decreasing limits. 

Let Ti,T 2 , . . . he positive rvs. The map K : G G from (10) is well defined, 
is monotone and a-continuous from above. 

If Fn ^ G is a pointwise increasing sequence and P{Yli > 0) = 1 for 

all positive t then KF^ increases to K lim^^ F^ at all continuity points of K lim^ 



Proof: The right continuous increasing function are the upper semi contin- 
uous increasing functions. The set of upper semi continuous increasing functions 
is convex and closed with respect to translations, to finite sums and to decreasing 
limits. Using the positivity of the functions the set is closed to finite products. 
Since the functions are bounded by 1 the set is closed to countable products. 

The map K from G to some function space is well defined. The map is order 
preserving, using the pointwise order on function space. The function K : G ^ G 
is well defined, since K{F){x) = K{F){t) by the Monotone Convergence 

Theorem. 

The monotone operator K is called (j-continuous from above if 
G3Fa\nF^ K{Fn) \n K{F) G e. 

The Monotone Convergence Theorem implies the cr-continuity of K from above. 

The (j-continuity from below follows also by the Monotone Convergence The- 
orem. The condition is needed to apply the theorem. q.e.d. 

Using left continuous versions for an increasing sequence F^ we obtain lim^ F^ 
is a left continuous increasing function and lim^ F^F^ = i^lim^Fn. In terms of 
stochastic convergence of measures there would be no distinction between left and 
right continuous versions necessary. 



Corollary 2.2. If || z^ll is finite almost everywhere then for some F G G 

K{F){t-) ^ eY[f . 

ien \ ^ / 

for all t > 0 is equivalent to F being a fixed point. 



( 12 ) 



Proof: An a.s. finite \\u\\ implies the condition in order to apply the Monotone 
Convergence Theorem. The rest is now obvious from the previous lemma, q.e.d. 



3. The set of fixed points 

In this section we study the set T of fixed points of K in some generality. Obviously 
the constant function 1 is always a fixed point. We call this the trivial fixed point 
and suppress it if possible. We first treat completely the case \\u\\ < 1 a.e. the case 




Fixed points for the maximum 



331 



u{l) = ||i/|| a.e. , and the case oo) = oo) > 0 for some t > 0. Afterwards all 

these cases are excluded and we collect general results on J. 

Case \\u\\ < 1. The case ||i/|| < 1 corresponds, after renaming, to the equation 

X = TX. (13) 

Proposition 3.1. Assume \\u\\ < 1. 

= 1 then any probability measure on [0, oo] is a fixed point of K. 
Otherwise, z/P(||z/|| = 0) > 0 then K has only the trivial fixed point and if 
P(||z/|| = 0) = 0 then the fixed points of K are CL^^d 5oo- 



Proof: The first statement is easy. 

For notational simplicity we use T instead of Ti > 0 and use T 2 = 0 = Ts = 
If P{T = 0) is strict positive then the equation (13) implies 

P{X = 0)= P{T = 0) + P{T ^ 0)P{X = 0), 



which in turn implies X = 0. 

Now let T be not identical 1 and take values in (0, 00 ). The probability 
measures Jq and Joo are solutions of (13). Suppose the equation (13) has another 
solution F. The function G defined by 

G{t) := E(Xlx<t) = [ ydF{y) 

Jm 



has the following properties: 

• G is right continuous and increasing. 

• limt_,o ^ = 0 

• limt_oo ^ = 0 

^ ^ T ) 

t ^ 

The first statement is easy. The second follows by G{t) < t dF(y) = 
t(F(t) - F(0+)). The third statement follows by 

lim ydF{y) + \ vdF{y) 

lim(F(t) - F(io)) = lim F{t) - F{to) -^to^oo 0. 

t t— ^OO 



lim 

t-^oo 



G{t) 



< 



The fourth statement follows by 

G{t) = E{XIx<t) - E{TXlTx<t) = E{TE(XlTx<t \ T)) = E{TG{^)). 

Let ti,t 2 be the infimum and supremum of the set of values, where the right 
continuous function t takes its supremum. (Notice the points of discontinu- 

ity are of the first kind and the function increases at every point of discontinuity. 
Therefore 0 < < ^2 < oo exist and are well defined.) We obtain 

G{tj) ^ 

f . ’ 

i = 1,2. This implies ^ is a.e. a supremum of the function t But this 

implies T < 1 for z = 1 and T > 1 for i = 2. But T = 1 was excluded. q.e.d. 

Case u{l) = \\u\\ and not \\u\\ < 1 a.e.. 




332 



Peter Jagers and Uwe Rosier 



The case u(l) = \\i/\\ corresponds to values 0 or 1 of the factors Ti,i e N. The 
corresponding weighted branching process is a branching process. The fixed point 
equation KF = F rewrites 

KF{t) = (14) 

Possible values of F{t) are the fixed points of the generating function 

oo 

[0, 1] 9 s f{s) = (15) 

with pk := — k). (Notice z/(l) < oo.) There are at most two solutions 

(/ the identity is excluded). One solution is always the value 1. There is another 
solution in [0, 1) iff > 1. 

Proposition 3.2. Assume P(||z^|| < 1) < 1 and u{l) = \\u\\ < oo. The fixed points 
of K are all functions in C taking as values only the fixed points of the generating 
function f (15). 

Case P{i'{t, oo) = oo) > 0 for some 0 < t. 

Proposition 3.3. Assume P{iy{t, oo) = oo) > 0 for some 0 <t. Let q he the smallest 
fixed point of [0, 1] 3 5 Then the fixed points of K are all functions 

in C taking as values only the fixed points of [0, 1] 3 s 

Proof: Using Y := z/(0, cxd) and excluding X = 0 the value P{X = 0) < 1 is 
a fixed point of s i— > Es^ since 

E{Ix^o) = Eilycoo^vTiXi^O + E{lY::^oo^\/TiXi^o) 

= E{Iy<ooIvt,x,=o) = E{P^{X = 0)). 

Let c G (0, oo] be the essential supremum of X. Then 

Elx<c = E{lY<oo^yTiXi<c + E{lY=ooKTiXi<c)) 

= E{Iy<ooIvt,x,<c) = E{P^{X < c)). 

is again a fixed point of [0, 1) 3 s Es^ . There is at most one fixed point. This 
leads to the possible solutions Cj{X) = qSq + (1 — ^)5c? c G [0, oo]. All these are 
solutions. q.e.d. 

We collect now a few general results on the set T of fixed points excluding 
the previous cases. 

Proposition 3.4. Any multiplicative translate (0, oo) 3 1 1 -> F(at), a> 0 of a fixed 
point F eT is a fixed point. 

The right side limit F{0-\~) = limt^o F{t) at 0 of a fixed point F solves the 
equation 

c = (16) 

in c. (Use here the convention 0^ = 1.^ 

There is at most one solution of equation (16) for c G [0,1). There is a 
solution iff EWuW > 1. 

The function identical to a constant c G [0, 1] is a fixed point iffc solves (16). 




Fixed points for the maximum 



333 



Proof: The first statement is obvious. 

For any solution F G T we have 

F(0+) = limF(i) = lim KF{t) = Kl\mF{t) = £[Jf(0+) = £JFII‘'H(0+). 

i 

Define the generating function g of \\i/\\, g{s) := for s G [0, 1]. The fixed 
points of g are exactly the solutions of equation (16). If F(||i^|| < oo) = 1 then the 
statement on fixed points of g is well known in the literature of branching processes 
[2]. The same arguments apply in the general case, including F(||i^|| < oo) < 1. 
We skip the details. q.e.d. 

Proposition 3,5. Let \\u\\ < oo a.e. If the measure Eu has no discrete mass then 
any fixed point is continuous. If the measure Ev has no discrete mass besides in 1 
then any fixed point is continuous besides a possible jump 1. // there is a jump 
to 1 at t then F(sup^ < 1) = 1, Ev(l) > 1 and F{t—) is the unique fixed point 
of the generating function f (15) in [0, 1). 

Proof: Let D be the set of discontinuities of the fixed point F. D is at most 
countable, since the fixed point F is increasing and can have at most countable 
many points of discontinuity of the first kind. For the following estimate note that 
for fixed t the set of all u; G satisfying G D if Ti(a;) / 1 is a null set relative 
to the measure Eu, 

Y,P{t 6 DTi, Ti^l) = - 0 . 



Therefore and by Corollary 2.2 we obtain 
F{t) 

and recall (12) 

F{t-) 

In case i/(l) = 0 we are done. Otherwise both values F{t) and F{t—) satisfy the 
equation 

s = Es^^^^b 

with s G [0, 1] and b = ri{i|Ti/i} P (^) • 

Consider the function [0, 1] 3 s g{s) = The function g is convex. 

In case F(0 < u{l) < oo, 6 > 0) = 0 the function g is independent of s. Therefore g 
has exactly one fixed point and F{t) = F{t-). In case F(0 < u{l) < oo, 6 > 0) > 0 
the function g is strict convex. Therefore g has at most two fixed points. By 
9(0) > 0 and ^(1) = Eb < 1 there is at most one fixed point in [0, 1). If F{t) < 1 
we are done. It remains the case F{t—) < F{t) — 1. This implies ^(1) = F6 = 1 
and consequently 6=1 and sup^ < 1 a.e. F{t—) is the unique fixed point of g 
in [0,1). q.e.d. 





334 



Peter Jagers and Uwe Rdsler 



4. Fixed points by monotonicity 

Like in potential theory the monotonicity of K will provide some fixed points. We 
determine the fixed points in case of sup^ Ti < 1. 

We always exclude the cases \\u\\ < 1 and z/(l) = ||z/||. 

Theorem 4.1. Suppose some F G C satisfies F > KF. Then the sequence K^F is 
pointwise decreasing in n to some fixed point of K. 

Suppose some F G 6 satisfies F < KF and P{Y\i F{ T,) > 0) = 1 for all 
t > 0 or \\i/\\ <00 a.e.. Then the sequence K^F^ n G N, ^5 pointwise increasing in 
n to the left continuous version of some fixed point of K. 

Proof: Since F > KF we obtain by induction K^F > K^~^^F. The sequence 
K^F decreases pointwise to some function G G 6. By the ^-continuity from above 
we obtain 

K{G) = K{limK^F) - UmK{K^F) = limF^+^F = G. 

n n n 

The argument for F < FF is analog. The imposed condition provides the 
interchangeability of the limit. Start with the left continuous versions of F. Then 
K^F is left continuous and the limit G = limn K^F is left continuous. The right 
continuous version of G is a fixed point of K in C. q.e.d. 

Using the stochastic order :<s on probability measures the above lemma reads: 
If K{p) :<s for ^ probability measure on [0, oo] then the sequence 

K^ia, n G N, decreases (increases) in stochastic order to a fixed point of K. 

The set T is partially ordered with the stochastic order. Define the infimum 
and supremum Vgr in T by 

FAgrG:=limF^(FAG) 

n 

FV 3 rG:= limA'"(FvG). 

n 

Lemma 4.1. Assume F(||z^|| > 1) > 0 and \\u\\ < oo a.e.. Then (T, <, Vgr, Ag^) 
is a lattice. This lattice has a minimal element given by the function = 0 and a 
maximal element given by the function = 1. The lattice is complete, i.e., every 
(bounded) set has a minimal upper bound and a maximal lower bound in 7. 

We skip the details. 



5. Factors bounded by 1 

In this section we determine the fixed points in case of sup^T^ < 1. We always 
exclude the cases \\p\\ < 1 and iy{l) = \\u\\. 

Let (Zn)n be a branching process with offspring distribution pf^ := = 

k), fc G Let qn '= P{Zn = 0) be the extinction probability of Zn and 
q — limn Qn be the extinction probability of the branching process {Zn)n> Notice 
fiQn) = Qn+i where / is the generating function /(s) = YlT=oPks’" {Pk)k- 

Let Fo := l[i,oo) and F^ := F(Fn-i). We obtain Fn = K'^{Fq) by iteration. 

Theorem 5.1. Assume \\i/\\ < oo a.e. and sup^T^ < 1. 

(i) The sequence Fn of right continuous distribution functions increases point- 
wise to some function F. The right continuous version of F is a fixed point 
ofK. 

(a) qn = Fn(l-) and q = F(l-). 




Fixed points for the maximum 



335 



(Hi) F = Fq is equivalent to = 0) = 0. 

(iv) The support of F is the positive real line unless sup^T^ <1 a.e. 

(v) If s\ip{Ti \ Ti < 1} < c < 1 a.e, then Qn = Fn{t) and q = F{t) for any 
te[c,l). 

(vi) F = 1 is equivalent to q = 1. 

Proof: i) For ^ > 1 

K{Fo){t) = E]lFo(Fj='^ = m 

and for t < 1 is KF(){t) > 0 = Fo{t). By iteration we obtain = K{Fn) > 
K{Fn-i) = Fn. The sequence Fn increases pointwise to some function F. Lemma 
4.1 provides the fixed point property for the right continuous version. 

ii) We show this by induction. The induction starts with Fo(l— ) = 0 = qo 
and the induction step n to n -h 1 is 

F„+i(l-) - KFn+i{l-) =El[Fn = EF::^^\1-) = /(g„) = qn+1. 

Since Fn increases to F pointwise we obtain F{1—) = q. 

iii) F = Fo is equivalent to ^ = 0 and this is equivalent to u{l) > 1 a.e.. 

iv) Let to be the left endpoint of the support of F. Assume 0 < to < 1* Then 

0 = F(io-) = KF{to~) ^eY[F f^) = EF^^^\to-)b = 

where b = nTi<i appropriate. We obtain F(sup^Ti < 1) = 0 and conse- 

quently 1 /( 1 ) = \\iy\\. But this case was excluded. Consequently to is 0 or 1. 

v) For c < t < 1 we obtain 

F{t) = KF{t) = EY[F‘^^^\t) = f{F{t)). 

i 

F{t) is a fixed point of / and therefore either q or 1. By F(l— ) = q we conclude 
F(t) = q. li q = 1 then consider the infimum t\ of all times t satisfying F{t) = 1. 
By the above ti < 1. If 0 < ti < 1 then the function t F{tit) is a fixed point 
and larger or equal than Fq. By the monotonicity of K we conclude F{ty )>F{-). 
Therefore F(titi) > F{t\) = 1 in contradiction to the definition of t\. The only 
way out is = 0 and F =1. 

vi) F = 1 implies ^ = 1. Now the reverse. Assume first Ev{l) < 1. Then 
there is an 6 > 0 such that Fi/((1 — e, 1]) < 1. Define the factors 

rj. Ti if Ti<l-e 

* ‘ I 1 else 

Then Ln = sup^^y^ L{v) < Ln- Since Fn is the distribution function of Ln we 
obtain Fn > F^ and in the limit F > F. By the last partial claim Ln and Ln 
converge to 0 and F = 1 = F. 

We come now to the case F(z/(1)) = 1. (The given proof would work also 
for the case Fi/(1) < 1.) The rv r gives every infinite path u G the smallest 
natural numbers, such that the weight of v\n is strictly smaller than 1, 

t{v) := inf{n | L{v\n) < !}• 




336 



Peter Jagers and Uwe Rosier 



(r is stopping time in an appropriate sense for trees.) Let X be a solution of (1). 
Define independent copies X{v),v G V. Then by induction for every n G N 

where An = {v\r^n I ^ Ki}* And again by induction 

X 1 \/,^aL{v)X{v) 

where A = limsup^ An- Notice A is still countable and the measure z> = Ylv ^L{v) 
is a.e. a Radon measure. (For the last statement use a finite extinction time for 
a critical branching process.) Interpreting the L{v),v G A, as factors in the last 
equality we conclude by our results, case £’/>(!) = 0, that X is identically 0. q.e.d. 

Analogously to the construction via an increasing sequence of distribution 
functions we can use a decreasing sequence (and right continuous versions). Let q 
be the extinction probability as above and define 

Go = I[l,oo) + ^1(0,1)- 

Lemma 5.1. Assume \\u\\ < oo and sup^T^ < 1. 

K^Gq decreases pointwise to a fixed point of K. This fixed point is F as 
above. 



Proof: Let ^ < 1. 

K(G„)(t) = 



®nco(^) 

i ' ^ 

n Go(^) 

r.i'F. 



{i|Ti<l} 

< Eq‘'‘'^h = q. 



This proves Go > KGq. The sequence K^Gq is decreasing and converges pointwise 
to some function called G. By K^Gq > K'^Fq we obtain G > F. 

Assume G > F which can happen only for 0 < 9 < 1. Let t 2 be the smallest 
real satisfying G = F on [^ 2 , 1). We consider first the case sup{T^ |T^<l}<c<l 
a.e.. for a constant c. Then for any ct 2 <t < t 2 the values F{t) and G{t) are fixed 

points of the map s with the iy b = riTi<i ^ (e) ~ riTi<i G • 

Like for branching processes consider the function q(s) := Ec^^^^b. This is a convex 
function in 0 < s < 1. It is not difficult to establish ^(0) > 0 and q(l) < 1 in our 
setting. Therefore s = g(s) has exact one solution in [0, 1). This implies F{t) = G{t) 
in contradiction to the choice of t 2 and the assumption. 

Now we consider the general case without the restriction. For e > 0 consider 
the factors 

if Ti<l-e 
^ * I 1 else 



r T,A(l-e) if Ti<l 

1 1 else 



Then F>F>F and G>G>G.BjF = G and F — G it suffices to show F 
and F converge to F as 6 decreases to 0. For this it suffices to show F^ and Fn 
converge to F^ as c decreases to 0 for any given n. However this is easy considering 
the corresponding rvs Ln = sup^^y^ L{v) and analogously L^/Ln- q.e.d. 




Fixed points for the maximum 



337 



Corollary 5.2. Assume ||i/|| < oo and sup^T^ < 1. 

Besides the trivial fixed point 1 there is in case q = 1 none and in case q <l 
up to translation only one fixed point F satisfying F{t) = 1 for some t E 



6. Water cascades example 

We return now to the toy example from the beginning. The state space is the 
additive group of integers. Let V = denote the knots of a binary 

tree with root 0. Let Ti{v) : {0, 1}, v eV,i E {1, 2} be independent identical 

distributed random variables with Bernoulli distribution to the parameter p. Define 
the weight L{v) of a knot v recursively by 

L{vi) = L{v)FTi{v) 

and L(0) = 0. The weight L{v) represents the number of all I’s on the path from 
the knot v to the root. (The edge (v,vi) carries a 1 iff Ti{v) = 1.) More precisely 

L[v) = 

where the sum is over all ancestors w ^ v oi v. Define Ln := infj^j^^^ L{v). Then 
Ln satisfies the backward equation [10] 

L„+i^(LJ, + Ti)A(L2+T2). 

Here L^, Ti, T 2 are independent, have the same distributions as Ln and 

Ti,T 2 are Bernoulli(p) random variables. 

In the multiplicative setting we consider the multiplicative group {2^^ | n E 
Z}. Taking the map x e~^ and the notation Ln = and Ti{v) = we 

obtain the backward equation 

K+i = {f,Ll,)W{f2Ll), (17) 

the standard form used in our setting. 



Proposition 6.1. The distribution of Ln is given by Fn = K'^Fq^ n E N, where 
^0 = I[i,oo)- The limiting distribution F is non degenerate iff 2p < 1 and in that 
case given by 




if t>l 

if t € [e-",e-"+i) 



The values an cmd ao = 1 satisfy the recursive equation 



a„ = {I - p)al + pal_i 



and are uniquely determined by this equation. 



Proof: A short calculation shows Fi = KFo corresponds to the distribution 
of L\. By induction we obtain Fn is the distribution function of Ln- The limiting 
distribution is not trivial, Lemma 5.1, if and only if the extinction probability is 
strictly less than 1. This is equivalent to E{1 - Ti) + ^(l — T2) > 1 which provides 
the criterium 2p < 1. Via induction on n show F has the above form and the an 
are determined (uniquely) by the equation. We skip the details. q.e.d. 

We considered here rvs (=water delays) on the edges. The argument with 
delays on the knots is, after reformulation, a special case. 




338 



Peter Jagers and Uwe Rosier 



References 

[1] D. Aldous and A. Bandyopadhyay A Survey of Max-Type Recursive Distributional 
Equations, http: / /front .math.ucdavis.edu/math.PR/ 0401388 (2004) . 

[2] K. Athreya and P. Ney, Branching Processes. Springer, Berlin 1972. 

[3] J.D. Biggins Martingale convergence in the branching random walk. Journal of Ap- 
plied Probability 14, 25-37, (1977). 

[4] A. Caliebe Symmetric fixed points of a smoothing transformation. Advances in Ap- 
plied Probability 35 377-394 (2003). 

[5] R. Durrett and M. Liggett Fixed points of the smoothing transformation. Zeitschrift 
fiir Wahrscheinlichkeitstheorie und verwandte Gebiete 64, 275-301, (1983). 

[6] R. Neininger and L. Riischendorf Analysis of algorithms by the contraction method: 
additive and max-recursive sequences. Preprint 2003. 

[7] Qu. Liu Fixed points of a generalized smoothing transformation and applications to 
the branching random walk. Advances Applied Probability 30, 85-112 (1998). 

[8] R. Lyons, R. Pemantle and Y. Peres Conceptual proofs of of LlogL criteria for mean 
behavior of branching processes. Annals of Probability 25, 1125-1138 (1995). 

[9] S.T. Rachev and L. Riischendorf A new ideal metric with applications to multivariate 
stable limit theorems. Probability Theory and related fields, 94, 163-187 (1992). 

[10] U. Rosier A fixed point theorem for distributions. Stochastic Processes Appl. 42, 
195-214 (1992). 

[11] U. Rosier, V. Topchii and V. Vatutin Convergence conditions for the weighted branch- 
ing process. Discrete Mathematics, 12 nr.l (2000). 

[12] U. Rosier and L. Riischendorf The contraction method for recursive algorithms. Al- 
gorithmica 29, 3-33 (2001). 

[13] U. Rosier A fixed point equation for distributions. Berichtsreihe des Mathema- 
tischen Seminars Kiel, Christ ian-Albrechts-Universit at zu Kiel, www.numerik.uni- 
kiel/reports/ Bericht 98-7, (1998). 

Peter Jagers 

Mathematical Statistics 
Chalmers University of Technology 
S-412 96 Goteborg 
Sweden 

jagers@math.chalmers.se 

Uwe Rosier 

Mathematisches Seminar 
Christian-Albrecht-Universitat zu Kiel 
Ludewig-Meyn-Strasse 4 
24098 Kiel 
Germany 

roesler@math.uni-kiel.de 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



The Number of Descents in Samples of 
Geometric Random Variables 

Arnold Knopfmacher* and Helmut Prodinger^ 



ABSTRACT: For words of length n, generated by independent geometric 
random variables, we consider the mean and variance, and thereafter the distribu- 
tion of the number of descents in the words. The cases of strict and weak descents 
are both considered. We also study the position and height of the first descent as 
well as the size of the greatest descent over all words of length n. 



1. Introduction 

Let X denote a geometrically distributed random variable, i. e. P{X = k} = 
pq^~^ for fe G N and q = 1 - p. The combinatorics of n geometrically distributed 
independent random variables Xi,. . . , Xn has attracted recent interest, especially 
because of applications in computer science. We mention just two areas, the skip 
list [1, 13, 16, 8] and probabilistic counting [3, 6, 7, 9]. 

One of the first combinatorial questions investigated for words ai . . . a^, with 
the letters independently generated according to the geometric distribution, 
was the number of left-to-right maxima in [14] . In [10] the study of left-to-right 
maxima was continued, but now the parameters studied were the mean value and 
mean position of the r-th maximum. 

In this article we study descents in a string of n geometrically distributed 
independent random letters. For example inw = 221114431 we have 3 strict and 7 
weak descents (equality is included). In the sequel we denote by Dn{w) the number 
of descents (strict or weak as indicated) in the word w, where w is of length n. 

In section 2 we study the mean and variance of Dn{w). Thereafter, in sec- 
tion 3 we study the distribution of the number of descents, which turns out to be 
Gaussian. Subsequently, in sections 4 and 5 we study the average position and 
height of the first descent. Finally, in section 6 we determine the expected value of 
the size of the greatest descent in a word of length n. By reversing the strings we 
see that exactly the same results hold for ascents as for descents in such words. 
Our treatment to some extent parallels the study of runs in geometric random 
variables by Grabner and the present authors in [4]. 

We make use of the following abbreviations: p = 1 — q, Q = 1/q, L = logQ. 
Words have length n. 



*This material is based upon work supported by the National Research Foundation under grant 
number 2053740 

+This material is based upon work supported by the National Research Foundation under grant 
number 2053748 




340 



Arnold Knopfmacher and Helmut Prodinger 



2. The number of descents 

As always in the setting of geometrically distributed random variables, we have 
two choices: to use the ‘>’ or the ‘>’ sign. Let us do the instance of strict descents 
first. 



2.1. Strict descents 

In order to determine the mean and variance of the number of strict descents we 
will make use of the following decomposition of the set of all words. Here {> k} 
denotes the set {fc, fc + 1, . . .}; for a given set A we denote 



A*=eUA+, 



k=l 



where £: stands for the empty word. We decompose the set of all words according 
to runs of I’s, separated by words consisting of larger digits only: 



{> 1}* = 1* ({> 2}+l+)*{> 2}+(e + 1+) + 1*. 



(1) 



We consider a probability generating function F{z,u)^ where z labels the 
number of random variables (or length of the word, if you wish), and u counts the 
number of descents. We should always have F(z, 1) = which we might use as 
a check, and a replacement of 2 ; by qz, if we increase all letters by 1, 

= — rr^ — . vzu 



^-pzi-[F{qz,u)-l]- 



pzu 

■pz 



Now we differentiate it w.r.t. u, plug in u = 1, set G{z) = -^F{z, 1), and get 



G{z) = G{qz 



(1-qz)^ 



+ 



pqz^ 



We set H{z) = (l — z)‘^G{z) and get H{z) = H{qz)-\-pqz‘^. Comparing coefficients. 



we see that 






-L + 9 

and that the other coefficients are zero. Consequently, 

2 

H{z) = ^ and G{z) — — ^ . 

^ ^ 1 + q ^ ^ l + q{l-zy 

Theorem 2.1. The expected value of the number of descents is given for n>l by 



[z^]G{z) = {n-l) 



l + q 



(3) 



Now that we see such a simple result, we will be tempted to look for a simple 
proof as well. Indeed, since the expectation is additive, we will have n — 1 times 
the expectation between two adjacent random variables. This quantity is 



^ pq' ^pq^ ^ 

i>j>l 



Q 

1 + g’ 



If we want to compute the variance, such a simple argument seems to be 
out of reach. Henceforth, we differentiate (2) twice, and use the notation V{z) = 




Samples of geometric random variables 



341 



We see 

V(A = + (l + ,)^(l_,)3(l_,,) : 

r, with W{z) = (1 — z)‘^V{z), 



W{z) = W(qz) - 



2{l - q)q^z{l + {1 + q)z + z'^) 2^1^ r 1 1 



' (l + g)2 (l + 9)ni-z l-qzi' 

Prom this we see that (with := [z'^\W{z)) for n > 4, 

Wn=q^Wr, + j^^^[l-q% or 

2^^fl — 

Furthermore we see that V02, = 7 rr:; 77 and W{) — w\ = W2 — 0. Hence 

(l + g)(l-g3) 

^ 2g^(l-g) 

^ (l + ^)(l_g3)(l_^)2 + (l+g)2(l_^)3 



[z^]V{z) = 



;{n — 2)(n - 3) 4- 



2q^{l-q) 



-{n-2). 



^ ' {1 + qf^ (1 + ^)(1_^3)V' r 

Adding the expectation and subtracting the square of the expectation, we obtain 
the variance. 

Theorem 2 . 2 . The variance of the number of strict descents is given for n>l by 
q{l-q)(l + q^) _ q{l-3q + q^) 

(1 + 9)3(1 -g3) (l+g)2(l_|_g_|_^2)- 

2 . 2 . Weak descents 

We will use the same decomposition (1), but with a slightly different translation 
into generating functions. 

Without explicitly mentioning it, we will use the same functions as before, 
adjusted to the new meaning. 

F{z,u) = (1 + :j— ^ — [F(qz,u) - h l) 

V 1-pzuJ i_iF(qz,u)-l]- ^ 'y^-pzu ) 

1 — pzu 

_j_ 1 _l_ ^ 

1 — pzu 



Therefore 






and H{z) = H{qz) +pz‘^. Consequently, 

H{z) = — 2^ and G{z) - , . 

^ ^ 1 + 9 1 + 9 (1 - 2)2 

Theorem 2.3. The expected value of the number of weak descents is given for n > 1 
by 

[2”]G(2) = (n-l) 



( 6 ) 




342 



Arnold Knopfmacher and Helmut Prodinger 



A simple explanation is also available as in the other instance, and therefore 
omitted. — Now we turn to the variance. We see 

y(^) = v{qz)^i^^ + + 9 + (1 - Q^)Q^ ~ 



{ 1 -zy 

or, with W{z) = (1 — z)‘^V{z), 



{1 + q)^{l - z)^{l - qz) 
2 



W(z) = W(qz) - 2(l-9Ml + (l + g)^ + 9V) 

^ ’ (l + g)2 (1 + 9)2 

Prom this we see that (with Wn '■= [z'^\W{z)) for n > 4, 

2 r. 2 



1 _ 1 - 
1 — z 1 — qz- 



Wn = q"'Wn + 



{i + qY 






or Wn = 



(1+9)"' 



2(1 - q) 

Furthermore we see that = — — ^ and wq = wi = W2 = 0. Hence 



(l + q){l-q^) 



V(z) = 



2(1-9) 



+ 



(1 + 9)(1-93)(1-z)2 (1+^)2(i_2:) 



and 

= (T^'” - 

Adding the expectation and subtracting the square of the expectation, we obtain 
the variance. 

Theorem 2.4. The variance of the number of weak descents is given for n>l by 

9(l-9)(l + 9^) _ 9(1 - 39 + 9^) 

(1 + 9 ) 3 ( 1 - 9 =^) (1 + 9 ) 2(1 + 9 + 92 )' ^ 

We note that this quantity is the same as in the instance of the strict descents. 



3. Distribution of the number of descents 

In this section we prove a central limit theorem for the distribution of the number 
of strict descents. In order to do this, we have to extract further information from 
the functional equation (2). We observe that the terms on the right-hand side are 
all simple rational functions, except for the terms containing F{qz,u). From the 
definition of F{z,u) it is clear that F{z^u) can be written as 

F{z,u) - 1 = ^z"/„(u) 

n>l 

for polynomials fn{u) with deg fn = n, whose coefficients are positive and < 1. 
Therefore we have 

1 In+l _ 1 

|/n(w)| < — T-j for |u| > 1. (8) 

Using q < 1 and (8) we obtain that F{qz,u) is holomorphic in \z\ < 

\u\ < Since for u = 2 = 1 we have 

1 - (1^(9, 1) - 1) 7^ = 0, and ^(1 - (F(9 +m) - 1) )| ^0 

^ ^ 1 -p dz\ 1 - pz/ \z=l,u=l 




Samples of geometric random variables 



343 



there exists a function f{u) holomorphic in a neighbourhood of u = 1 such that 
z = f(u)~^ solves 

1 - [F{qz, u) - l) = 0 

^ ' 1 - pz 

and satisfies /(I) = 1. Furthermore, |/(e^^)| < 1 for 0 < |t| < s for some e > 0 by 
an application of Rouche’s theorem. Thus we can write 

F{z,u)-1= +R{z,u), (9) 

where g{z,u) and R{z,u) are holomorphic in \z\ < |u — 1| < J for some J > 0. 
Now we are in the general framework of Hwang’s quasi-power theorem (cf. [5]) and 
can deduce the following theorem. 



Theorem 3.1. The number of strict descents in words of length n produced by 
independent geometric random variables obeys a central limit law, more precisely 



Dn{w) < ——n + t 



l q{l-q){l + q^) 

(1 + 9)3(1 -g3) 



y/n ) = $(t) + 0(n 2 ). 



(10) 



Similarly we can show 

Theorem 3.2. The number of weak descents in words of length n produced by in- 
dependent geometric random variables obeys a central limit law, more precisely 



P Dn{w) < — — n + 1 

\ 1 + 9 



f 9(l-g)(l + 93) 

(1 + 9)3(1-93) 



Vn I = + 0{n 2 ). 



(11) 



Remarks In [12] Louchard and Prodinger find an explicit bivariate generating 
function for samples of geometric random variables with k strictly ascending runs. 
Since the number of weak descents is one less than the number of strictly ascending 
runs, we may deduce from their result that in the case of weak descents 

F{z,u)=^^^^^, where P = J](l +p^(l - %”). (12) 

n>0 



To verify this directly one must show that (12) is the solution of the functional 
equation (5) that satisfies F{0,u) = 1. This follows easily by substituting (12) in 

(5) using F{qz,u) - 

Similarly, in the case of strict descents one easily verifies that the solution of 
the functional equation (2) that satisfies jP(0, u) = 1 is 



F{z,u) 



1 — u 



where now P — (1 — pz{l — u)q^). 



(13) 



n>0 



In both (12) and (13) we may let g — > 1 and obtain as expected the bivariate 
exponential generating function for the Eulerian numbers 



A{z,u) 



1 — u 

^{u-l)z _ 




344 



Arnold Knopfmacher and Helmut Prodinger 



4. The position of the first descent 

4.1. Strict descents 

We use the following variation of the number of descents decomposition for the 
set of all words: 



{> 1}* = 1*{> 2}+(l+{> 2}+)*(e + 1+) + 1*. (14) 

We consider a probability generating function F{z,u), where 2 : labels the 
number of random variables, and u marks the initial position of the first descent. 
In working with this decomposition it is convenient to adopt the convention that 
a nondecreasing word has a descent at its end. 



F{z, u) 



1 — pzu 
which simplifies to 



[F{qz, u) - 1] 



+ ) ( 15 ) 



l-[F(gz,l)-l]-^ 1-P2 ^-Pzu 
1 — pz 



u) = - — ^ [F{qz, u) - 1] — — + - ^ 

1 — pzu 1 - z 1 



pzu 



(16) 



Now we differentiate it w.r.t. u, plug in u = 1, set G{z) = -^F{z^ 1), and get 



G(z) 



1 



-G{qz) 

z 



1 — qz 
1 — pz 



+ 



pz 

1 — pz 



We set H{z) = (1 — z)G{z) and get (1 - pz)H{z) = H{qz) + pz. Comparing 
coefficients, we see that hn := [z^]H{z) satisfies hi = 1 and for n > 2, — 

phn-i = q^hn- By iterating this recurrence, hn = P^/{q]q)n, with the notation 
(a; q)n = {1 — a){l — aq) ... (1 — aq^~^). Consequently, 



H{z) = ^ 

n>l 



(«; (?)n 



and G{z) 



1 



1-2 



E 



(g; «)n 



1 



1-z 



n 



1 

1 — pzq^ ’ 



by one of Euler’s partition identities. 

Our convention gives a contribution of n for the position of the first descent 
in each nondecreasing word, whereas 0 might be more natural. If we prefer the 
latter convention then we should subtract n times the probability p^/{q]q)n of ^ 
nondecreasing word from [z'^]G{z) to compute the average. 



Theorem 4.1. The average position of the first descent is given for n>l by 



^ 



(17) 



(We used the notation [n]q\ = (g; q)n/p^-) Note that as ^ 1 we find that the 
average position of the first descent in a permutation equals Yl]=i j\ ~ ^ ~ 1? 

as n ^ 00 , in agreement with results of Knuth [11]. As ^ > 0, we obtain a word 

of all I’s and hence 0 for the average position of the first descent as expected. 

Previously, using a different approach. Prodinger [15] studied the length of 
the r-th ascending run in a geometrically distributed word. The length of the first 
weakly (strictly) ascending run corresponds to the position of the first strict (weak) 
descent, respectively. 




Samples of geometric random variables 



345 



4.2. Weak descents 

We use the same decomposition (14) for the set of all words. 

We consider a probability generating function F{z^u), where 2 : labels the 
number of random variables, and u marks the initial position of the first weak 
descent. Again it is convenient to adopt a convention that a strictly increasing 
word has a weak descent at its end. We split up our decomposition into the cases 
of zero or one 1, or two or more I’s. In the latter case a weak descent has already 
occurred with initial position equal to 1. 



F{z^u) = (1 Fpzu) 



[F{qz, u) - 1] 



+ 






l-[F{qz,l)-l]^ 
[F{qz,l)-l\ 



P^l-pz 



pz 



1 -P2 1 _ [F(qz,l) - 1]-^ 1 
1 — pz 



+ 



(l + r^), 
\ 1 — pz/ 



(18) 



which simplifies to 



F{z,u) = {l+pzu)[F{qz,u) - 1]- 



■qz 



+ 



p z u qz 



+ 1 + 



1- z '1— pzl-z'^'l — pz 
Now we differentiate it w.r.t. u, plug in u = 1, set G{z) = -^F{z, 1), and get 



(19) 



G{z) = Y^(G{qz){l - qz){l + pz) + pzy 

We set H{z) = (1 — z)G{z) and get H{z) = H{qz){l F pz) F pz. By comparing 
coefficients we find that 



= E 



P Q 






n>l 



( 9 ; q)r 



and 






P Q 



a). 



n>l 



(q-,q)n 



2>0 



by another of Euler’s partition identities. 

Since our convention gives a contribution of n rather than 0 for the position 
of the first descent in each increasing word, we subtract n times the probability 

p'^q^^") /{q; q)n of a strictly increasing word from [z'^]G{z) to compute the average. 



Theorem 4.2. The average position of the first descent is given for n>lhy 



" gG) ng(3) 

“ {q\ q)j (9; 9)n ^ ~ “ [j]?! N,! ■ 



(20) 



Note that as 1 we find again that the average position of the first descent 
in a permutation equals X]j=i ;^-^^e-l,asn^(X). Asg— >0, we obtain 
a word of all I’s and hence 1 for the average position of the first weak descent as 
expected. 



5. The height of the first descent 

5.1. Strict descents 

We stay with decomposition (14) and consider a probability generating function 
F(z,u), where now u marks the initial height of the first descent. Once again we 




346 



Arnold Knopfmacher and Helmut Prodinger 



adopt the convention that a nondecreasing word has a descent at its end. 
u 1 



F{z,u) = 



1 —pz 



[F{qz, u) - 1] 



which simplifies to 

F{z,u) = 



1 - pz 

.r.t. i 

G{z)^ 



+ T^)’ 

1 — pz 

( 21 ) 

+ ( 22 ) 



1 — pz 
d 



Now we differentiate it w.r.t. u, plug in u = 1, set G(z) = -^F{z, 1), and get 

I .1-qz z 

■G{qz),^^ + 



1-z '1-pz 1-z 

We set H{z) = (1 — z)G{z) and get (1 —pz)H{z) = H{qz) + z{l — pz). This leads 
to 



H{z) = z + iY,pT 



and 



G(*) = T^ + -T^Ef- 

1-2 






q)r 



Theorem 5.1. The average initial height of the first descent is given for n>l by 



-Er^ + i = -Em + '- 



(23) 



^ bit?! 

As ^ ^ 1 the sum tends to infinity indicating that the average height of the 
first descent in a permutation grows as a function of the size of the permutation. 
As ^ ^ 0, we obtain a word of all I’s and hence 1 for the average height of the 
first strict descent as expected from our convention that such a word has a strict 
decent at the end. 

5.2. Weak descents 

Again we use decomposition (14) and consider a probability generating function 
F(z, u), where now u marks the initial height of the first weak descent. Once again 
we adopt the convention that a strictly increasing word has a descent at its end. 

We split up our decomposition into the cases of zero or one 1, or two or more 
I’s. In the latter case a weak descent has already occurred with initial height equal 
to 1: 

[F{qz, u) - 1] 1 



F{z,u) = u(l +pz) 



+ 



p^z'^u 



1-pz 

[F{qz,l)-1] 



l-pzi_ [F(qz,l) - 1]^^ 1 
1 — pz 



+ 1 + 



pzu 
1 — pz' 



(24) 



Again we simplify, differentiate w.r.t. u, plug in w = 1, set G(z) = -^F{z, 1), and 
get 

G{z) = (G{qz){l - qz){l + pz) + z ^ . 

We set H{z) = (1 — z)G{z) and get H{z) = H{qz){l pz) + z. This leads to 

2(2) z'^ 



P^l (9;9)n 



and G{z) = - 




Samples of geometric random variables 



347 



Theorem 5.2. The average initial height of the first weak descent is given for n>l 
by 

1 ^ ^ 1 ^ qii) 

As g — > 1 we again find that the average height of the first descent in a 
permutation grows with n. As q 0, we obtain a word of all I’s and hence obtain 
1 for the average height of the first weak descent as expected. 



6. The maximum size of a descent 



We study the following parameter X: 

X := max{xi — \ l<i<n&:xi> Xj+i}; (26) 

it is consistent to assign the value 0 if no pair of consecutive letters satisfies the 
condition. It is more convenient to think from right to left, and we may still call 
it X; the statistics are the same: 

X := max{xi^i — Xi\l<i<n^Xi< Xi^i}. 

Fix a parameter h and consider all words with X < h. That is, that for all 1 < 
z < n, we must have that G {1, 2, . . . , + h}. 

The adding- a-new-slice technique works here: Let fn{u) be such that [u^]fn{u) 
is the probability that a word of length n has X < h and last letter equal to i. 
Then fi{u) = ha^e the replacement rule 

— > pu + pqu^ H h = pu - — . 

1 — qu 

Consequently, 

U^,{u) = -^fn{l) - ^p—fniqu). ( 27 ) 

I — qu 1 — qu 

Now define F{u) = F{z,u) = J2 Then, upon summing, 

n>l 






n>l 



n>l 



1 — qu 



n>l 



or 



F{z,u) - zh{u) = r^F{z, 1) - P^^F{z,qu). 
Now we iterate the functional equation 

F(.) = ^ + ^F(l) - PX^Fiqu). 



1 — qu 1 — qu 

Setting u = 1 and collecting terms: 



I — qu 



(28) 






k>l 



k>l 



{<i)k 




348 



Arnold Knopfmacher and Helmut Prodinger 



1 + F(l)=(l-^ 



{Q)k 



It is better now to include the empty word and thus work with 

k>i 

We must study the dominant zeros ph of 



k>l 



( 9 )* 



( 29 ) 



(30) 



Bootstrapping is the appropriate technique in this instance: 

1 - z + \ 0. 

1+9 



Set ph = l + Eh, then Sh ~ We have 1 + F{1) ~ — 

1 



Ah = 



ph<y'{ph) 



VE 



zjPh 



with 



k>\ 



( 9 ) A 



(31) 



So the probability that X < h is approximated by 









As one can check, ~ 1, and thus we may work with exp(— 

We now apply the Mellin transform (cf. [2]) to the function f{t) = J2h>o “ 
exp(— This yields the transformed function 

/*(s) = r{s) ? for — 1 <^s < 0. 

Application of the Mellin inversion formula, shifting the line of integration to the 
right and collecting (negative) residues yields 



1 /.-l+ZCX) Q , 






fcGZ ^ 2~ioo 'i 



(32) 



The residues are easily computed to be 

9® 



-ResP(5)V^r 

1 — i 

_Resr(s)il^t- 



s=0 






4K^)(^) 



2k-ni 

L 



(33) 

(34) 



This yields 




Samples of geometric random variables 



349 



Theorem 6.1. The mean value of the size of the greatest descent Mn{w) in a string 
of n geometric random variables satisfies 



EM„{w) ~ logQ n + ^ - Mg+ i ) . + 






(35) 



where S{x) is a periodic function of mean zero, period one and small magnitude, 
with Fourier series 






2km\ 



2k-Kix 



k^O 



References 

[1] L. Devroye. A limit theory for random skip lists. Advances in Applied Probability, 
2:597-609, 1992. 

[2] P. Flajolet, X. Gourdon, and P. Dumas. Mellin transforms and asymptotics: Har- 
monic sums. Theoretical Computer Science, 144:3-58, 1995. 

[3] P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base appli- 
cations. Journal of Computer and System Sciences, 31:182-209, 1985. 

[4] P. J. Grabner, A. Knopfmacher and H. Prodinger. Combinatorics of geometrically 
distributed random variables: Run statistics. Theoret. Comp. Sci., 297: 261-270, 
2003. 

[5] H.-K. Hwang. On convergence rates in the central limit theorems for combinatorial 
structures. European Journal of Combinatorics, 19:329-343, 1998. 

[6] P. Kirschenhofer and H. Prodinger. On the analysis of probabilistic counting. In 
E. Hlawka and R. F. Tichy, editors. Number-theoretic Analysis, volume 1452 of 
Lecture Notes in Mathematics, pages 117-120, 1990. 

[7] P. Kirschenhofer and H. Prodinger. A result in order statistics related to probabilistic 
counting. Computing, 51:15-27, 1993. 

[8] P. Kirschenhofer and H. Prodinger. The path length of random skip lists. Acta In- 
formatica, 31:775-792, 1994. 

[9] P. Kirschenhofer, H. Prodinger, and W. Szpankowski. Analysis of a splitting process 
arising in probabilistic counting and other related algorithms. Random Structures 
and Algorithms, 9:379-401, 1996. 

[10] A. Knopfmacher and H. Prodinger. Combinatorics of geometrically distributed ran- 
dom variables: Value and position of the rth left-to-right maximum. Discrete Math., 
226:255-267, 2001. 

[11] D. E. Knuth. The art of computer programming, volume 3: Sorting and Searching. 
Addison- Wesley, 1973. Second edition, 1998. 

[12] G. Louchard and H. Prodinger. Ascending runs of sequences of geometrically dis- 
tributed random variables: a probabilistic analysis. Theoretical Computer Science, 
304:59-86, 2003. 

[13] T. Papadakis, I. Munro, and P. Poblete. Average search and update costs in skip 
lists. BIT, 32:316-332, 1992. 

[14] H. Prodinger. Combinatorics of geometrically distributed random variables: Left-to- 
right maxima. Discrete Mathematics, 153:253-270, 1996. 

[15] H. Prodinger. Combinatorics of geometrically distributed random variables: Length 
of ascending runs. Proceedings of LATIN’2000, Lecture Notes in Comp. Sci. 1776, 
Gonnet, Panario and Viola (Eds.), 473-482, 2000. 

[16] W. Pugh. Skip lists: a probabilistic alternative to balanced trees. Communications 
of the ACM, 33:668-676, 1990. 




350 



Arnold Knopfmacher and Helmut Prodinger 



Arnold Knopfinacher 

The John Knopfmacher Centre for Applicable Analysis and Number Theory 
University of the Witwatersrand, P. O. Wits 
2050 Johannesburg, South Africa 
ar noldkn® cam . wits . ac . za 

Helmut Prodinger 

The John Knopfmacher Centre for Applicable Analysis and Number Theory 

Department of Mathematics 

University of the Witwatersrand, P. O. Wits 

2050 Johannesburg, South Africa 

helmut @mat hs . wits . ac . za 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Large Deviations for Cascades and Cascades of 
Large Deviations 

Alain Rouault 



ABSTRACT: In a Mandelbrot’s multiplicative cascade on [0,1], let r be 
the number of cells, and the mass of the measure at the height n in the r- 
ary tree. Most of the known results deal with the limit n — > oo with r Gxed. In 
some previous papers, we began the study of r oo when n is Gxed, showing 
a law of large numbers and describing a new large deviations phenomenon when 
the cascade generator is bounded. Actually, there are self-similar rate functions at 
geometrical scales, reGecting the multiplicative structure of the process. Here we 
describe the behavior of a typical cascade, knowing that is given different from 
its mean 1. The main result is a Sanov type theorem, at different rates, leading to 
a Gibbs conditioning principle. 



1. Introduction 

Cascades are a very popular mathematical model for rainfall, internet packet traf- 
fic, market prices, etc. The literature is now very broad since the foundation 
(Kolmogorov [12], Mandelbrot [17], until more recent improvements. Extensive 
bibliographies are in Barral [1] or Liu [13], and Ossiander and Waymire [18] for 
statistical aspects. 

Let IE > 0 a random variable of mean 1 . The cascade generators are given by 
a family {lEi} of independent copies of W, indexed by the set of all finite sequences 
i = n > 1 of positive integers. We are interested in the random variable 

W W W 

l<ii 

which is the mass of the cascade at height n. Let Tn be the cr-field generated by 
all the W{ for those i of height less or equal to n. For r fixed, {Z^}n is a {Tn}n 
nonnegative martingale. Let be its a.s limit when n oo. It is a.s. strictly 
positive if log r > EWlogW (Kahane-Peyriere [11]). 

Another asymptotics consists in letting r ^ oo with n fixed (finite or infinite). 
For n = 1 it is the classical sum of i.i.d. random variables. For n>2, the summands 
are dependent: the terms corresponding to i and j have in their products a number 
of common random variables equal to the height of the last common ancestor i Aj 
of i and j. Notice that it is also possible to write = r~^ Rfc (Z^“^ o 
0;-) a.s. , where 0^ is the tree shifted at the node of label k, and Z^ appears as 
a sum of r i.i.d. random variables, with a distribution depending on r. In [15], Liu 
and Rouault and in [14], Liu et al. extended results known forn = 1, in particular 
the law of large numbers. For n < oo fixed, they proved that limr-^oo Z'!^ — 1 a.s. 
(remember EW = 1). It means roughly that only the first level contributes to the 
limit. It is also true for other types of results (CLT, LIL), 




352 



Alain Rouault 



In the study of large deviations, a new phenomenon occurs under the as- 
sumption 

w := ess sup W < co . (1) 

Let A be the cumulant generating function: 

A(6>) = log , 6/efH, 

and A* be its Legendre dual: 

A*{y) = sup 0y - A{6) , y G fH. 
eeo\ 

It is well known that A* is finite for y G {w,w) where w:= ess inf W, and infinite 
for y ^ [w^ w ] . 

In [15] it is proved that the family of distributions of satisfies the following 
tail estimates, at different rates. Roughly, every level contributes to large devia- 
tions, at an appropriate scale and appropriate rate, with a kind of self-similarity, 
justifying the title of the present paper. 

Theorem 1,1. ([15]) If assumption (1) holds, then for any k > 0 and any x G 
lim log?(^r > = -A*{xw -^) . 

r— >•00 

Actually, at intermediate levels (n fixed finite) we have a similar result (see 
i) below). The result for the first range (1,^) (corresponding to A: = 0) may be 
extended to the case w = 00 under the assumption 

Vr > 0 (BexprW < 00 , (2) 

(see ii) below). The proof of is similar to the proof of [15] Theorem 1.4. We 
could also use [16]. Part ii) is s, consequence of [14]. 

Proposition 1.2. i) If assumption (1) holds, then for any 0 < k < n and 
X G 

lim >x) = -A%xw~^). 

r^oo 

ii) If assumption (2) holds, then for any x > 1, 

lim log^(Z^ >x) = -A*{x ) . 

r— J’OO 

It should be clear that the same results hold when one considers left tails, 
i.e. < x) for X < 1. For the analog of we put no assumption, and for ii) 

we assume zy > 0. 

A quite natural question is then the description of a typical cascade of branch- 
ing number r (large), of mass approximately a ^ 1. We use large deviations tech- 
niques for which we recall some definitions. 

Let a^>, r G an increasing sequence of positive real numbers with lim^_^oo cir 
= 00 . We say that a sequence of probability measures (IP^) on a regular Hausdorff 
space (X, !B(X)) satisfies a Large Deviation Principle (LDP) with rate function I 
at scale ar if / is a lower semi-continuous function / : X ^ [0, 00 ] such that 

a) For any closed subset F of X 

limsup — logT^(F) < — inf I{x) 

r — >•00 xeF 




Cascades and Large Deviations 



353 



b) For any open subset G of X 

limsup — logiPr(G) < — inf I{x ) . 

T — >00 X^G 

The rate function is good if for every c the level set {x : I {x) < c} is compact. 
The sequence (T^) is said to be exponentially tight at scale if for every L > 0 
there is a compact set such that limsup^ ^ logT^(K£) < —L. 

As a first answer to the question of typical cascade, our main result (Theorem 
4.2) is a Gibbs conditioning principle. If the cascade generator W satisfies (1) , then 
for a G and n fixed, the distribution of {Wi ^ , * * • ) - in) conditioned 

on G [a — rj,a + rj] converges weakly to 0 as r ^ oo 

followed by 7 / — > 0. Here Sc denotes the Dirac mass at c, is obtained by tilting 
Q to get mean instead of mean 1. It is a consequence of a Sanov type 

theorem (Theorem 2.1) giving the LDP for the empirical distribution: 

■ (3) 

This empirical distribution is a random measure on [ 0 , 00 )^^. In Section 3, we use 
the contraction principle to study again the masses and in Section 4, we deduce 
the typical behavior along a fixed branch. It would be interesting to have a more 
general Gibbs principle, taking into account the limiting distribution along a fixed 
number (> 1) of branches, like in Corollary 7.3.5 of [4]. 

The structure of our problem^ leads us to conditioning at stage n upon the 
(j-field Tn-i? and use induction. In particular, we have to argue about conditional 
LDP, or in other words, we have to manage mixture of distributions which satisfy 
LDP. Various authors have studied this topic, Dinwoodie-Zabell [6] for exchange- 
able vectors (and more recently Trashorras [20]), Chaganthy [3] for statistical pur- 
poses, Grunwald for statistical mechanics ([10] section 2), Finnoff for evolutionary 
games [9], and Biggins for random graphs in [2]. Here, we have tried an (almost) 
self-contained treatment of this question adapted to our model. 

Let us give some notations used in the sequel. First, S is a Polish space in 
Section 2, and is in the end of Section 3 (Corollary 3.2) and in Section 4. We 
denote by 6m(E) the set of bounded Borel functions from E to 91. Let Mi(E) be 
the set of probability measures on E equipped with the weak convergence denoted 
by If / G 6m(E) and u G Mi(E) we denote as usual, 

< f, I' > = 

The relative entropy or Kullback information between two probability measures a 
and (3 in Mi(E) is defined by: 

whenever a is absolutely continuous with respect to Otherwise H{a\\l3) = +oo 
(see [4] Appendix D3). We will use the following variational formula ([7] Lemma 
1.4.3 p.36): 

H{a\\P) - sup{< > - log < > : (f e 66(11)} (4) 



^ This model involves a dependence between strings of variables different from that of ^/-empirical 
measures (see [8]). 




354 



Alain Rouault 



For p G and I < j < n, let Uj be the projection of u on the first j 

coordinates, and let nu = i^n-i- 

The random variable W is T, valued, and its distribution is denoted by Q. In 
Section 3 and 4, we assume that (BW = 1. For j > I let (and u<S>Q^ = 

for every measure u). 

For / : fR let nf = €/(•, W) or in other words 7t/(x) = /(x, y)Q{dy). 



2. A Sanov type theorem 

In the whole section we fix n < oo. We consider as (random) element of Mi (E’^) 
equipped with the weak topology. Let S be the support of Q. 

Theorem 2.1. a) The family (L^ G .)}r satisfies the LDP at scale r with 
good rate function 

J^(zy) = H{ui II Q)ifjy = Ui® (5) 

= H-oo otherwise . 

b) If S is compact, then for every 2 < k < n, the family {^(LJ? G .)}r 
satisfies the LDP in at scale r^ with rate function 

II ® Q) i/ 1/ = i/fc OQ""'' (6) 

= +CXD otherwise . 

For n = 1, a) is the classical Sanov theorem ([4] p.261). For n > 1, our 
strategy uses conditioning, LDP for conditional distributions and induction. As 
usual, we need exponential tightness, provided by the following lemma. Its proof 
involves some important formulas used in the sequel. 

Lemma 2.2. a) Ifn> I, (LJ?^ G .)}r is exponentially tight at scale r. 
b) For n > 2, and for every sequence {^r}r ^ Mi(E’^“^) such that => 
{^(£"€-|£r'=er)}r is exponentially tight at scale r. 

To prove this lemma, we use a slightly modified version of the criteria of 
exponential tightness of Deuschel-Stroock Lemma 3.2.7 p.67 of [5]. 

Lemma 2.3. If {'jr)r is a family of random probability measures on a Polish space 
y, then the family {^{jr ^ *)}r exponentially tight at scale Or as soon as there 
is a tight family of probability measures {pr)r such that for every (f G 6m(y) 

2 exp < (f,arjr > < < exp(p,pr . (7) 

Proof of Lemma 2.3: We give a construction similar to that of Deuschel-Stroock’s 
(see also Dembo-Zeitouni’s Lemma 6.2.6 [4]). For every L > 1, we have to find a 
compact set Cl ^ Mi(y) such that for every r > 1 

?J(7r i Cl) < 2 exp -Lor . (8) 

For I >llete^ = e ^2 Since {pr}r is tight, one can find a compact such 
that Pr{Ki) > 1 - for every r. We claim that Cl = n^>L > 1 - 

suits. Thanks to the Portmanteau theorem, the set Cl is closed; since it is tight, 
it is compact by Prohorov’s theorem. We have for f^=£‘^-\- £log2 on and 0 
on Ki 



VilriKi) < 1 - £ ^) < (£(exp < fe,ar'yr >)(2e^) 




Cascades and Large Deviations 



355 



Now, from (7) £(exp < fe,a,rjr >) < < exp/^,Pr >“*■ and from the definition of 
Kt, 

<eyi^fe,pr>= Pr{K^)-\-e^^2^{l- pr{K()) < 2. 

With the three last displays and the union of events bound we get ^(7r ^ < 

Yli>L ? which yields (8). □ 

Proof of Lemma 2.2: For n = 1, the statement a) is classical ([4] Lemma 6.2.6). 
We assume from now n > 1 and S is unbounded (otherwise it is trivial). Let us 
begin with a display often used in the sequel. If ft- E 6m(S^), we have 

2(exp < r’^ft, LJ! > | J'n-i) = exp < r^ft, > , (9) 

where ft(xi,X 2 , ...,a:n-i) = log£expft(a:i,a: 2 ,...,Xn-i,IF) , and in particular 

r~^log£ (exp < = <. log £expft(*, W), /i > . (10) 

a) Let us prove by induction on n that for every (p E bm(E'^) 

(Bexp < > < {(Bexpp){Wi,..,Wn)Y , (11) 

which is (7) with pr = ar = r. For n = 1, we have actually an equality. 

Assume that (11) holds at level n — 1. Prom (9) 

Cexp < >= £exp < (p,rL'^~^ > 

with (p{x) = log S exp ip{x, W)). We apply (11) to (p e 6m(E^“^): 

(£exp<<^,rLr'>< [2exp^(Wi, • • • , W,_i)]’^ (12) 

= (g[((£{expri-XTyi,--- , • • • , 

< {<i[<t{expip{Wu--- ,W^„-i}])" (13) 

where the last display comes from Jensen’s inequality with the convex function 
X . This yields (11) at level n. 

b) Let (p E 6m(S^). Applying (10) with ft = r^~'^(p we get 

r“^log(E(exp < (p,r£j^ > = ^r) = < ^og(Bexpr^~'^cp{.,W),^r > 

< r^~^\og < €expY~'^(p{.,W),^r > 

< log < e exp (/?(., W),^^ > 

where we use twice Jensen’s inequality, (concavity of logarithm and convexity of 

X 1-4 ). If the family {^r^Q)r is tight, so we have (7) with pr = 

ar = r. □ 

The large deviations for conditional distribution are ruled by the following 
lemma. Let x(^l^) = 0 if a = 6 and x(a|6) = oo if a ^ 6. For ^ E Mi(E’^“^) and 
/ E 6m (E^) let 

hU) :=< logs exp /(-,W) , C> • 

Lemma 2.4. Let us assume Cr ^ C- 

a) For every k < n, the family (LJ? E • | = ^r)}r satisfies the LDP 

at scale r^ with good rate function x(- 1 C ^ Q)* 




356 



Alain Rouault 



b) The family G • | ^ = ^r)}r satisfies the LDP at scale r'^ with 

good rate function 

3^(i/) sup {</,!/> -A^(/) ; / G 6m(S")} . (14) 

Moreover 

= H{u\\^<SiQ) ifvn-i=^ 

= +00 otherwise. (15) 

Proof of Lemma 2.4: Prom Lemma 2.2 our family is exponentially tight at scale r 
hence at scale for every k > 1. Owing to Corollary 4.5,27 of Baldi’s theorem 
([4] p.l60), it is then sufficient to study the limits of the cumulant generating 
functionals. Let us first notice two well known limits used in the sequel. For g G 
6m(E) 

lim log(£expa^(IP) = ^g{W) (16) 

a — >-0 

lim log(Sexpa^(lP) = essupp(lP). (17) 

a^oo 

a) Prom (9), we have for fc < n 

r-'= log e( exp < rV, = 0 = 

= < r"-''log«exp [r^~^f{;W)] , ^ > 

and its limit, asr— >oois</,^(g)(5> =: < > (see (16). Moreover it is 

possible to replace in the above display ^ by a sequence without change in 

the conclusion. We get the LDP with rate function 

r sup{< f,u > - < > ; / e 6m(E")} = x(- U® <3)- 

b) Prom (10) 

lim r-" log « (exp < r"/,£” > | = ^r) = 

= < log (£exp /(■,!+), ^ >= A^(/), (18) 

which gives the LDP with rate function 3^ given by (14). 

Let us prove that 3^ has the non variational expression (15). If / involves 
only the first n — 1 coordinates, i.e. f{yi,y) = '^(x) then 

</,I/>-A|(/) ^ 

and the supremum taken on is infinite if Un-i ^ So 

hi^) = +0° if rn-i 7^^. (19) 

Now if Vn-i = ^ we may write u = where z> is a regular probability kernel so 
that 

3^{u) = sup / ( [ f{x,y)r{dy\x) - log « exp /(x, IP)) ^{dx) . 

f J'^ri-1 \Jj2 J 

Applying (4) with o = P(-|x) and /? = Q, we get 

3^{y) < [ H (i/(.|x) II Q) ^(dx) = . 

Applying again (4) but with a = v = v and ^ ^ 0 Q we obtain 

H \\ = sup { < ^ 1 , 1 / > - log < e^, ^ (g) Q > } . 

9 



( 20 ) 




Cascades and Large Deviations 



357 



Jensen’s inequality yields log < (g) Q > > which leads to 

^ (C ^ ^ IK ^ Q) < 3^(^) • 

With (20) and (19) this yields (15). □ 

Remark 2.1. This implies uniform conditional LDPs, i.e. that for every closed set 
F (resp. open G), every e > 0 and every fi one can find an open neighborhood 
of iJL such that, from the one hand, for k < n 



limsup sup r 


G F 


1 £r' = 


= ^) 


< 


- inf ® 


Q) -\-e 


(21) 


r 












liminf inf r~ 

■ r 


''log«p(£:! gG 


1 - 


= 0 


> 


- inf 

v^G 


Q)-e 


(22) 


and from the other 


hand 














limsup sup 


r-”logq3(£” G 


F 1 £”- 


-1 _ 




- ~ 


+ e 


(23) 














liminf inf 


r-”log^(£” G 


G 1 £"- 


-1 __ 


0 




— € 


(24) 



Proof of Theorem 2.1: As previously said, the statement a) is known. 

1) If 5 is compact, we prove by induction that for n > 1 
{^(<CJ? € .)}r satisfies the LDP at scale r'^ with rate function . (25) 

Assume that the statement (25) is true forn — 1. Let us first prove the lower bound 
of the (unconditioned) LDP at level n. Let G be an open neighborhood of u such 
that 3J!J(z>') < 00 and let e > 0. We have 

? (i:” € G) > e G I £”-1 = e) (£r' € ttG n K,) (26) 

and from (24) 

liminf r-" log inf ^ (£” e G I = 0 > (27) 

> - inf 3,r^(p) - € > - e . 

p£G 

where tvG = {ttA, A E G} and TGi/ is a neighborhood of ttu. It remains to find a 
lower bound for r~'^ log^(£;^“^ E ttG fl K-i/). Before using the LDP at order n — 1 
we need a very simple lemma which is surely known but for which we don’t know 
any reference. 

Lemma 2.5. If S is a compact metric space and P a probability measure on S with 
full support. Then each non empty open subset 0 of (5) contains a probability 
measure absolutely continuous with respect to P. 

Proof of Lemma 2.5: Every probability measure on S may be approximated in the 
Levy metric by a weighted finite sum of atoms. By linearity, it is then sufficient 
to work with an open neighbourhood of a Dirac mass for xq E S. Now, if 
Bj := {x e S : d{x, xq) < j~^} we have P{Bj) > 0 for every j, by definition of the 
support of P. The probability measure = P{-\Bj) is absolutely continuous with 
respect to P and S^q as J ^ oo (apply the Portmanteau theorem [4] p.356). 
□ 

End of the proof of Theorem 2.1 Thanks to the above lemma, we can pick some 
qQ^~^ in G' := ttG fl K-i/? then for every M > 0 we truncate q into qM — 
{M~^ y q) /\M and set qM =< qM^Q^~^ >~^ qM- The dominated convergence 




358 



Alain Rouault 



theorem implies QmQ^ ^ ^ qQ^ ^ as M — > oo. Choosing M large enough to get 
^ •= QmQ^~^ ^ G', we deduce 

Zz\{i>) = H{vn-i II P „-2 ® 0) < MlogM^ < 00 . 

The LDP at order n — 1 (statement (25) for n — 1) yields 
liininf log^J e ttG n > - inf ^ e ttG n > 

> — M log M ^ for every M > 0, so that lim inf^ r“” log € ttG fl is 0. 

Gathering (26) and (27) we get the good lower bound. 

For the upper bound, we remark that for every Borel subset A of Mi(5’^), 

*P(£”eA) = [ ^ {L’;! e A \ = fl) P G dfi) 

J TV A 

< sup ^ (i:;? e A I ' - m) , (28) 

IjlEttA 

where irA = {tt^ : ^ G A}. 

Since S is compact it is enough to consider only F compact. Following the 
method of [4] p.150-151, we consider for every u e F a, convenient neighborhood 
Ajy of 1 / and get an upper bound for sup^^^^^ (/C^ G A \ — /i), and then 

make a finite covering of F. The details are left to the reader. 

Let us remark that the rate function is 

3nAi^) = [ II Q)un-i{dx) = H{v II Vn-i ^ Q) = . (29) 

We conclude that (25) holds at every level. 

2) Let us now consider intermediate levels. Fix fc > 1 and if fc > 1 assume 
that S is compact. We show by induction that for n > fc 

{^(CP G .)}r satisfies the LDP at scale with rate function . (30) 

The statement is true for from (25) above. Assume that it is true forn — 1. Prom 
(9), we have r“^log^(exp < r^/, > | = fi) = < log (3 exp 

/(•, VF)] ,/i > and its limit, asr— >oois</, //(8)(5>=: < tt/, / i > (see (16). 
Moreover it is possible to replace in the above display // by a sequence ^ 
without changing the conclusion. With the same argument as above, we get a 
conditional LDP with rate function x(-|/^ ^ Q)- Then we use the same line of 
argument as in 1). For the proof of the upper bound, there are two cases. For 
fc > 2 we take into account that S is compact as above. For fc = 1 we take 
into account Lemma 2.2 a). Thanks to the statement (30) for n — 1, we obtain a 
(unconditional) LDP of rate function 

inf x('^Im®< 5) +^r^(M) (31) 

/x€Mn-i 

Now J^(i^) < oo forces u = fi^Q with 3^“^(/i) < oo. By (6) = fik <S> 

hence u = smd in particular Moreover, in this case, (31) yields 

3^{u) = 3^“^(/i) = H{i/k II ^/c-i ^Q) = ^fc(^)? which says that the statement (30) 
holds for n. 

We have ended the proof of Theorem 2.1. □ 

Remark 2.2. 1) The rate function 3^ is very similar to the rate function of 

the empirical measure of k-tuples in a sample of size r of i.i.d. copies of 
W (as it appears in [4] Theorem 6.5.12). 




Cascades and Large Deviations 



359 



2) Formula (31) is of the same type as Lemma 2.3 in [9], formula (2) in [2], 
or formula (2.4) in [3]. 

3. Contraction and LDP for masses 

We first study linear functionals of the empirical measure, and then take the 
particular case of masses. We assume in this section that S is compact. 

Proposition 3.1. Fix n, fe G {1, • • • , n} and f G C(E^, 91). The family of distribu- 
tions of {< /,-C^ >}r satisfies the LDP at scale r^ and good rate function 

h(c) = : 1 / e Mi(S”) ,</,;/>= c} . (32) 

Moreover 

h{c) = sup {Oc - log [ exp {6 7r„_i/(j/)) Q{dy)} , (33) 

6>€91 Js 

h{c) ^ sup [9c - log sup [ exp {9 7r„_fc/(x, y)) Q{dy)} ifk>2. (34) 
6»6« x€E*’-i -/S 

Proof: The first claim is a consequence of the contraction principle, since u 

f^y> is continuous. To prove (33), notice that = H{vi || Q) if i/ = z/i (8) Q, 
and then < /, i/ > = < i^i >. 

Fix k > 2. Let ^*(c) the right hand side of (34) so that we have to prove 

Ik=t. (35) 

Set, for X G A G 91, ^ G Mi(S*^-i) 

Ax(A) := logj exp(A 7T„_fc/(x,y))Q((Zy) (36) 

A^(A) := <A.(A),^> and A|(c) := sup(Ac — A^(A)) , (37) 

A 

Prom the obvious equality sup{A^(A) ; ^ G Mi(S^“^)} = sup {Ax (A) ; x G 
and the minimax theorem (the ^ set is compact - see [4] p. 151) we deduce 
£*{c) = supx (Ac — sup^ A^(A)) = sup^^inf^ (Ac — A^(A)) = inf^ A|(c). Let z/ G 

Mi(E’^) such that < f,v >= c and J^(z^) = c. Then v = Vk and < 

Uk >= c where g := TTn-kf- Applying once more (4), we have for x G 

H (^'fe(-|x) \\Q)>xj g{x.,y)vk{dy\:x.) - Ax(A) . (38) 

If i^k-i satisfies J^k_i^j.g(x^y)iyk-i{dx)uk{dy\x) = c, an integration of (38) and 
a maximization in A yield 

f H{uk{.\K) II <5)t'fc_i(dx) > A*^_^(c), 

and taking infimum on i/k-i we get Ik{c) > £*{c). Let us prove the reverse in- 
equality. Fix ^ G Mi(S^“^) and assume A|(c) < oo. The function A^ is convex, 
infinitely differentiable and strictly increasing, and 

AUoo)= / essup g(x,W)^{dx) ^ A£(-oo) = / 



essinf^(x, W)^{dx) . 




360 



Alain Rouault 



If c G (A^(— oo), A^(oo)), let A the unique solution of A^(A) = c. This A reaches 
the supremum in (37). For r]{dy\x) = exp(A^(x,y) — Ax(A)) Q{dy) 

[ H{rj^x)\\Q)adx) = Al{c). (39) 

For fixed we get the inequality 

A|(c) > inf I / H (? 7 (.|x) || Q)^{dx) ; / </(x,y)^(dx)r/(dy|x) = cl . 

(40) 

If c ^ [A^(-oo), A^((X))], we have A|(c) = oo and (40) holds. If c = A^(oo) < 
oo, we have A^(c) = — f^k-i ^og^(g(x, W) = c) ^{dx) < oo, so that ip(^(x, IF) = 
c) 7 ^ 0 for 1 / a.e. x and choosing 

i/fc((iy|x) = l/(x,y)=c{^J(/(x, VF) = c)}“^ Q{dy) 

we see that (39) holds in this case (and also if c = A^(— oo)). Henceforth (40) 

holds in all cases. Taking infimum on ^ we get i*{c) = inf^A|(c) > Ik{c) and 

finally equation (35) holds, which ends the proof of Proposition 3.1. □ 

In the particular case of masses, we assume (1) and take E = [0, u;]. We 
denote Pn(x) := xiX 2 • • • for x G E^ and take f = Pn in the above proposition. 
This gives the following result, which recovers Proposition 1.2. 

Corollary 3.2. 1 ) The family of distributions of Z'f: satisfies the LDP at scale 

r with good rate function A* . 

2) For 2 < k <n the family of distributions of satisfies the LDP at scale 
r^ with good rate function given by 

Ik{x) = 0 ifxe[w^-\w^-^] 

I^{x) - K\xw-^^-^^)ifx^[w^-\w^) (41) 

7^(x) = k\xw-^^-^^)ifxe[ut.2^~^) 

I^{x) = +00 ifx^[]^,w^]. 

Proof: Start from TTn-kfi'^, y) = Xi-- Xk-iy- With the notation of (36) we have 
Ax(A) = A(xiX 2 • • -Xfc-iA). This gives sup^^^^-i Ax(A) = A(u)^“^A) if A > 0 and 
A(^“^) si A < 0. □ 



4. Gibbs conditioning principle 

We want to characterize typical behaviors along a given branch, by means of the 
Gibbs conditioning principle, ([19] or Chapter 7.3 in [4]). Mass plays the role of 
energy. For 5 > 0, let — {z/ G Mi(E^) : | < Pn^iy > - a | < 5} be the ’’energy 
constraint”. The following lemma identifies the Gibbs states. We assume (1). 

Let Fa = {i/ G Ml (91^) : < Pn.v >= a). For 0 G 91 let Qe{dy) := 

Q{dy), whose mean is A'(0). For every x G (w,w) let 9{x) be the unique solution 
of A'{6) = X. 

Lemma 4.1. 1) When w < a < w and n > 1, = Qe{o) ^ is the unique 

solution of 

( 42 ) 




Cascades and Large Deviations 



361 



2) When a 6 [w^ for k>2, ^ <S> Qe(aw~(^-^)) ^ the 

unique solution of 

inf{J^(i/);z.€Fj = J^(i/(“)). (43) 

3) When a 6 = 5^^ ® Qe{aw-<-'‘-^y) ® is the unique 

solution of ( 43 ). 

Here is the main result of this section. 

Theorem 4.2. Fix n> 0 and k <n. Let T he an open neighbourhood of 
limsuplimsup r~^log^(iL^ ^ F | <C^ G As) < 0. 

)-0 r 

Consequently, for any • ,in; the distribution of (Wi^, • • • , Wi^...i^) condi- 

tioned upon Z^!' G [a — S, a 5] converges to v^\ as r oo, followed by S 0. 

Proof of Lemma 4.1: To simplify, we treat only the second case and set a = 
9 = 6{a), z> = From the previous section, we know that 

3Ut>)) = H{Qe\\ Q) = 0a-A{9). 

So z> is a solution. Let p £ Fa another one. Since 5^(/i) < oo we have p = Pk®Q^~^ 
and 

(/^) ~ ^ i^l^k II Pk—1 ^ Q) — H (^pk II Pk — 1 ^ Qo^ ^ I ^kPk{3j^) ^iP) . 

Now since pk is absolutely continuous with respect to pk-i ^ Q, the support of 
Pk{dy\x) is a subset of S. Since < oo we know from Theorem 2.1 b) that 

p = Pk<Z> which gives 

a = < p„,/u > = < pfc,/Ufc > < XkHkidTi). (44) 

Since ^ > 0, we get A*(a) — H (/i^ || pk-i ^Qq) > 9aw~^^~^^ — K{6) = A*(a) 
which implies pk = Pk-i ^ Q$- Carrying into (44) gives = < pk~i,pk-i > 

which forces pk-i = d^k-i. □ 

Proof of Theorem 4.2: Clearly, 

limsuplimsup r“^log^(LP ^ F | G < 

( 5— *>0 r 

< lim limsup r~^ log^ (L!? G F^ fl As) — lim liminf r~^ log?P i£Jl G . 

( 5-^0 r <^-^0 r 

Prom the upper bound of the LDP (Theorem 2.1 b)), the first term of the right 
hand side is less than — lim^_,o inf {H{p\Q^^) ; p e D As}. Since the sets 
F^ n As are closed and nested (see [4] Lemma 4.1.6), the bound is equal to 
— inf [H{p\Q^'^) ; p £ H Fa}, which is strictly smaller than -Ik{a). For the 
second term, we apply the lower bound of the LDP to As so that lim inf r~^ 
log^(£” e As) > -inf{F(p|Q®«) ; p € mtAg} > = -h{a). □ 

Acknowledgement. I thank Wendelin Werner for asking me this question. 




362 



Alain Rouault 



References 

[1] J. Barral. Generalized vector multiplicative cascades. Adv. in AppL Probab., 33 (4): 
874-895, 2001. 

[2] J.D. Biggins. Large deviations for mixtures. 

Available at http://www.shef.ac.uk/~stljdb/. 

[3] N.R. Chaganthy. Large deviations for joint distributions and statistical applications. 
Sankhya, 59:147-166, 1997. 

[41 A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications. Springer, 
2nd edition, 1998. 

[5] J.D. Deuschel and D.W. Stroock. Large deviations. Academic Press, 1989. 

[6] I.H. Dinwoodie and S.L. Zabell. Large deviations for exchangeable random vectors. 
Ann. Probab., pages 1147-1166, 1992. 

[7] P. Dupuis and R. Ellis. A weak convergence approach to the theory of large deviations. 
Wiley, 1997. 

[8] P. Eichelsbacher and U. Schmock. Large deviations of [/-empirical measures in strong 
topologies and applications. Ann. Inst. H. Poincare Probab. Statist, 38(5):779-797, 
2002. 

[9] W. Finnoff. Integration of large-deviation kernels and applications to large deviations 
for evolutionary games. Probab. Theory Relat Fields, 122:141-162, 2002. 

[10] M. Grunwald. Sanov results for Glauber spin-glass dynamics. Probab. Theory Relat 
Fields, 106:187-232, 1996. 

[11] J.-P. Kahane and J. Peyriere. Sur certaines martingales de Benoit Mandelbrot. Ad- 
vances in Math., 22(2):131-145, 1976. 

[12] A. N. Kolmogorov. The local structure of turbulence in incompressible viscous fluid 
for very large Reynolds numbers. Proc. Roy. Soc. London Ser. A, 434(1890) :9-13, 
1991. Translated from the Russian by V. Levin, Turbulence and stochastic processes: 
Kolmogorov’s ideas 50 years on. 

[13] Q. Liu. On generalized multiplicative cascades. Stochastic Process. AppL, 86(2) :263- 
286, 2000. 

[14] Q. Liu, E. Rio, and A. Rouault. Limit theorems for Multiplicative Processes. J. of 
Theoret Probab., 16:971-1014, 2003. 

[15] Q. Liu and A. Rouault. Limit theorems for Mandelbrot’s multiplicative cascades. 
Annals of Appl. Probab., 10(l):218-239, 2000. 

[16] M. Lowe. Iterated large deviations. Stat. and Probab. Letters, 26(3):219-223, Febru- 
ary 1996. 

[17] B. Mandelbrot. Multiplications aleatoires iterees et distributions invariantes par 
moyenne ponderee aleatoire: quelques extensions. C. R. Acad. Sci. Paris Ser. A, 
278:355-358, 1974. 

[18] M. Ossiander and E.C. Waymire. Statistical estimation for multiplicative cascades. 
Ann. Statist, 28(6): 1533-1560, 2000. 

[19] D.W. Stroock and O. Zeitouni. Microcanonical distributions, Gibbs states, and the 
equivalence of ensembles. In Durrett R. and H. Kesten, editors. Festschrift in Honour 
of F.Spitzer, pages 399-424. Birkhauser, 1991. 

[20] J. Trashorras. Large deviations for a triangular array of exchangeable random vari- 
ables. Ann. Inst. H. Poincare Probab. Statist, 38(5):649-680, 2002. 

Alain Rouault 

LAMA, Universite de Versailles-Saint-Quentin 

rouault @mat h . uvsq. fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Partitioning with Piecewise Constant 
Eigenvectors 

Christiane Takacs 



ABSTRACT: In the present paper we investigate the partitioning properties 
of piecewise constant eigenvectors of matrices describing the mutual positions of 
points. We discuss differences arising from choosing different matrices and present 
an example where one has to be careful with the selection of the appropriate 
eigenvectors. 



Let F 0 be a finite set of vertices, e.g. points in some space, with IVI = n. 
In order to describe the mutual positions of the points we introduce the distance 
matrix D = (dij). where dij is the (Euclidean) distance of the points i and j. 
We assume D to be non-negative and symmetric. Secondly, let W = {wij). j^y be 
the similarity matrix, where Wij is the similarity of the points i and j, and Wij — 0 
means that i and j are completely different. We assume W to be non-negative and 
symmetric. The matrix W can be derived from D by e.g. Wij = with some 

a > 0. 

Let di ^ YTj=i dij and A = diag {di,. .. ,dn) and assume A to be regular with 
the scaling '^^di = 1, further let Wi = = diag . . . , 

and assume Ct to be regular with Wi = 1. Prom the computational viewpoint 
the matrix W has to be preferred because in applications it is often a sparse 
matrix. But since my initial contact to the problem was a question concerning the 
clustering properties of piecewise constant eigenvectors of the distance matrix, I 
consider this, too. 

The object (F, F x V,D) (resp. (F, F x F, IF)) is interpreted as a weighted graph 
where the points i and j are connected by an edge with weight dij (resp. Wij). 
Because of the symmetry of the matrices D and W the graphs are simple and 
undirected. In the context of electric networks the weight Wij corresponds to the 
conductance of the edge (i, j). The graph corresponds to a finite and reversible 
Markov chain with transition probabilities Wij/wi. This Markov chain performs 
a transition from one vertex to another with a probability proportional to the 
conductance of the connecting edge weight. A weight dij could be interpreted as 
some sort of electrical resistance, but we will not use this here, instead we interpret 
dij as some (crazy) conductance, too. 

Now, given the distances or similarities of points one wants to identify clusters of 
points, i.e. group together points which are near or similar, and put cuts between 
groups of points which are apart or dissimilar. In the literature there are a lot of 
algorithms (see [2] , [6] , [7] for a survey) finding an optimal partition. The spectral 
methods have in common that they use piecewise almost constant eigenvectors as 
indicators for an underlying partition. The goal of the present paper is to compare 
different partitioning criteria and to get a better understanding of their features. 




364 



Christiana Takans 



1. Criteria for Partitioning 



For A,B cV let (Iab •- Ejgb wab E^ga Ejgs 

quantifies the distance between the sets A and B, whereas the latter represents 
their similarity. Due to our scaling we have dyy = 1 and wyy = 1. Let \A\ be the 
number of points in A. In the following we introduce criteria for finding a partition 
{A, B) of the set V. 



The average association criterion maximizes 
The average distance criterion minimizes 



AAss{A,B) 
ADist (A, B) := 



'^AA 4_ 

1A| + \B\ • 
dBB 



dAA 
1^1 



^BB 

\B\ 



The normalized association criterion maximizes N Ass (A, .B) + 2 £bb_^ 

The normalized distance criterion minimizes NDist (A, B) := 5 ^ + 

^ ^ dAV dsv 



Note that each criterion optimizes some average quantity within the partitioning 
subsets, and thus can easily be generalized to partitions with more than two sets. 
Different features of a set A C are covered by 



A Ass A := 



ydAA 



NAss A 



^AA 



\A\ WAV 

the mean similarity within A and the proportion of the similarity between A and 
V covered by the similarities within A, 



ADist A := 



d^AA 

W’ 



NDist .4 := 

dAV 



the mean distance within A and the proportion of the distance between A and V 
covered by the distances within A. 

Note that we have 0 < AAss A < NAss A < 1 and 0 < ADist A < NDist A < 
1. The average quantities are average for A = F but may be smaller or larger for 
smaller sets A. The normalized quantities are usually small for small sets A and 
maximal for A = V. For a partition (Ai, . . . , Ak) of V we define 



NAss ( A]^ , . . . , Af ^ ) I — NAss A\ NAss A^ 



and observe that 0 < NAss (Ai, . . . , A^) < k. Since this estimate depends on k 
and since usually NAss A is small for small sets A there is no canonical strategy 
when comparing optima for partitions with different numbers of elements. 



Remark 1.1. One could also optimize quantities between the partitioning subsets. 
In [5] the criterion NCut is minimized, where 



NCut {A, B) := ^ ^ = 2 - NAss (A, B) . 

WAV WAV 

Thus the NCut criterion is equivalent to the NAss criterion. For the normalized 
distance cut criterion 4^ 4- 4^ max, we also have 

dAV dsv ’ 



dAB dAB 
dAV dBV 



2 -NDist (A,B) 



and thus equivalence to the NDist criterion. 

These equivalences suggest how to generalize cut criteria to partitions of V with 
more than two elements, namely e.g. 



NCut (Ai, . . . , Afc) : = (1 - NAss Ai) -f . . . -h (1 - NAss Afc) 

^ '^AlM I I '^AkAl 

WA^V ydA^v' 




Partitioning with Piecewise Constant Eigenvectors 



365 



2. Features of vectors 



For A C P let e be a vector with (Ia), = 1 for i G A and 0 otherwise. 
Since |^| = 1^1.4, waa = wav — we have the 

representations 



AAss^ = 



i\WlA 



NAss A = 



1\W1a 

1\^1a 



and analogous expressions for ADist A and NDist A. Therefore we define the quan- 
tities for column vectors x GEn- So let 



AAssx := 
ADist X 



x^x 

x^Dx 

x^x 



NAssx : 
NDist X 



x^Wx 
x^Qx ’ 
xWx 
x^Ax* 



Note that each definition coincides with the original definition for x = 1^. 
Prom the definition 



A Ass X = 



EtJ WijXiXj 



we see that AAssx is large if for large wij, i.e. high similarity of the points i and 
the entries Xi and Xj have the same sign. The optimal x has the same sign 
everywhere and maximum weight at highly connected vertices. Both features can 
be used for clustering. In contrast we observe that 



ADist X = 



Si, 7 dijXiXj 



is small (negative) if for large distances of points the corresponding entries of x 
have different signs. For NAss and NDist there are similar argumentations, but 
the following theorem clarifies their features. 



Theorem 2.1. We have 

NAssx = 

NDist X = 

Proof: The first assertion is proved by 



jj<j 



{Xi Xj ) 



Ei<jWij{Xi-Xj) 



Ei mxt 



Ei 

^i<j ~ ^j) 

T,i dixj 

1 E»J Wjj {Xj - Xjf 

2 • • Wijx^ 

Ei,j 'Wjj {Xj - Xjf 



= 1-2 



[x^ + x]) 



■ ■ Win 



1 — NAssx. 



For the second assertion change w to d. 




366 



Christiane Takacs 



For a normed vector x the quantity 1 — NAss x calculates the sum of squares 
of the differences of x along the edges of the graph. This expression is small, i.e. 
NAss large, if x has large differences only where similarities between vertices are 
small. NAssx is maximal for x = 1, but this is not useful for partitioning. For 
1 — NDistx on the other hand we argue that it is large, i.e. NDistx is small, 
if X has large differences where distances between vertices are large. While the 
Ass— criteria stress the similarities, the Dist— criteria stress the distances. 



3. Partitioning with vectors 



This section links the features of vectors to the criteria for partitioning. For A, S C 
V with A n B = 0 we introduce vectors xab which indicate the sets A and B by 
different signs and are zero otherwise. 

With XAB ■= 1^1 A - we have 

AAss ( A U B) + AAss xab = AAss (A, B) 

ADist (A U B) + ADist xab = ADist (A, B) . 

With XAB := - ir^ls we have 

NAss (A U B) + NAss xab = NAss (A, B) . 

With Xab := we have 

NDist ( A U B) -{- NDist xab = NDist (A, B) . 



Proof: Since all proofs are essentially the same we only prove the third property. 

As a consequence of A n B = 0 we have 



Xab^Xab = 

Now we calculate 

XabWxab = 

Since 



4“ 



1 



+ 



1 






+ „.o. 



1 



1 



V^QIa Vb^Ib 



{1*^W1b + 1%W1a). 



I^^WIb + I^b^Ia 



we have 



I^WIaub - I^aWIa + I^WIaub - 1],W1b 



Xab^Xab 



NAss A 
+ NAss B 



1 

(ipn 






+ 



i%mB 



^AuB^^AuB 

I^^QIa l%fllB 




Partitioning with Piecewise Constant Eigenvectors 



367 



and we arrive at 

N Ass xab 



Xab^Xab 

Xab^Xab 

NAss^ 4- NAssjB 






This together with 



i^niA + i^niB 



^aubWIaub ^ 



completes the proof. 

If (A, B) is a partition of V we have with appropriate xab 



AAss (A, B) 
ADist (A, B) 
NAss (A,B) 
NDist (A,B) 



= AAss V + AAss Xab 
= ADist V + ADist xab 
= 1 + NAss XAB 
= 1 + NDist XAB. 



Since the first terms on the right side do not depend on the partition, the left sides 
are optimized by optimizing the second terms on the right side. Thus the sum of 
two Rayleigh quotients on the left side is actually reduced to the one on the right. 
If we want to partition V into two sets, we use a vector looking like xab for some 
partition (A, B) such that its AAss or NAss is maximal or its ADist or NDist is 
minimal. 

If X is an eigenvector of W (resp. D) with eigenvalue A we have 



AAss X = 
ADist X = 



x^Wx _ 


X* (Ax) 


= A 


X^X 


X^X 


x^Dx 


x^ (Ax) 


= A. 


x*x 


x^x 



If X is an eigenvector of (resp. A ^D) with eigenvalue A we have 



NAssx 
NDist X 



x^ITx _ x^nf)-^Wx _ x.^n (Ax) _ 
x^fix x^flx x^fix 

x*Dx _ X* A (Ax) _ ^ 
x^Ax x^Ax 



So if an eigenvector of W or Q.~^W with the largest (possible) eigenvalue looks 
like some xab, the partition (A,B) is optimal with respect to the corresponding 
criterion among all 2-partitions. The same is true for eigenvectors with smallest 
eigenvalues of D or A“^D. 

In [5] the above argumentation for NAss is the basis of a partitioning algo- 
rithm using eigenvectors. The algorithm is analyzed with respect to its running 
time and experimentally tested for several images. The simultaneous use of k 
eigenvectors is proposed in [5], too. The latter algorithm is modified, analyzed and 
tested in [4]. 

But the above argumentation can be made even more precise. 




368 



Christiane Takacs 



4. Partitioning with eigenvectors 

On the one hand the eigenbasis of an appropriate matrix is used for optimization 
of a Rayleigh quotient, but on the other hand, as we will see, the eigenvectors 
themselves have additional clustering properties. The matrices D and W are sym- 
metric and therefore diagonalizable. For A~^D and this is a consequence 

of the following (elementary) lemma and the successive remarks, which also hold 
for W replaced by D and fi by A. 

Lemma 4.1. The following assertions are equivalent. 

(i) X is a right eigenvector ofQ~^W with the eigenvalue A. 

(a) X is a right eigenvector of I — with the eigenvalue 1 — A. 

(Hi) (fix)^ is a left eigenvector offt~^W with the eigenvalue A. 

(iv) is a right eigenvector of the symmetric matrix with 

the eigenvalue A. 

(v) is a right eigenvector of the symmetric matrix (fi — W) 
with the eigenvalue 1 - A. 

Since the Laplacian — kF is symmetric and positive semidefinite, the matrix 
{Q — W) is symmetric and positive semidefinite, too. Therefore there 

exist n linearly independent (orthogonal) eigenvectors with non-negative eigenval- 
ues. Because of the equivalence of 1. and 5. the matrix fl~^W is diagonalizable, 
too, and all eigenvalues are bounded from above by 1. Note that the vector 1 is a 
right eigenvector of Q~^W corresponding to the eigenvalue 1. 

Theorem 4.1. Let (zi, Ai) , . . . , (z^, An) be the eigenpairs of W (resp. 
where z\zj = 0 (resp. z\Qzj = OJ for i ^ j. For x = 

AAssx= 9 ^ (resp. NAssx= o V - 

We see that, if x is represented as a linear combination of eigenvectors of W 
(resp. AAssx (resp. NAssx) is a convex combination, with the squares 

of the linear factors as coefficients, of the corresponding eigenvalues. To maximize 
AAssx (resp. NAssx), where x looks like some Xab, one will choose the x such 
that it can be combined from eigenvectors with the largest possible eigenvalues. 
Since the eigenvector of with the largest possible eigenvalue is useless for 

combining a xaBi ho. I^^Xab = 0, one will take the eigenvector to the second 
largest eigenvalue. If this almost looks like some xab its coefficient in the convex 
combination of NAss xab will be large and the partition (A, B) has a good chance 
to be optimal. 

When we want to minimize ADist (resp. NDist) for some partition and we 
have an eigenvector of D (resp. A~^D) with a small, i.e. negative, eigenvalue 
looking like some xab, its coefficient in the convex combination of 'NAss xab will 
be large and the partition (A, B) has a good chance to be optimal. 

Theorem 4.2. Let (zi, Ai) , . . . , (z^, A^) be the eigenpairs of W (resp. Q~^W), 
where z\zj = 0 (resp. z\Q.Zj = 0) for i ^ j. If (Ai,...,Afc) is a partition of 
V such that the indicator vectors are within the linear span o/ (zi, . . . ,Zm ) , 
then AAss (Ai, . . . , Ak) (resp. NAss (Ai, . . . , A ^) ) is a k-fold convex combination 
q/ Ai , . . . , XjYi • 




Partitioning with Piecewise Constant Eigenvectors 



369 



Proof: Since each indicator vector 1^. linearly combines from the eigenvectors, 
its AAss lAi is a (1— fold) convex combination of Ai, . . . , A^. Thus 

k 

AAss (Ai , . . . , Afc) = AAss 1 

i=l 

is a A:— fold convex combination. 

So for finding more than two clusters simultaneously one will choose few 
piecewise (almost) constant eigenvectors with large eigenvalues. Then the indicated 
partition has a good chance to be optimal. A completely analogous argumentation 
holds for the eigenvectors of D and A~^D. 

The following theorem, which is intimately related to proposition 2 of [4], 
gives necessary and sufficient conditions for a matrix to have piecewise constant 
eigenvectors, and thus tells us when the above algorithms are exact. 



Theorem 4.3. Let M be a matrix and (Ai, . . . , A/c) a partition of its index 
set {1, . . . , n}. Then we have: 

M has the eigenvalues Ai,...,Afc and the corresponding right eigenvectors are 
linearly independent and piecewise constant on (Ai, . . . , A/c) 

'Iff for all r,s = 1,. . . ,k we have 

mia = rrija for all ij G A^ 

creAs creAs 



and 



(^rs)r,s=i,...,fe where nirs = ^ for i € Ar 

(tEAs 



can be diagonalized with the same eigenvalues Ai, . . . , A/e. 



Proof: In this proof let V := (vi, . . . , v^) be the matrix of the eigenvectors of 

M corresponding to the eigenvalues (Ai, . . . , A/e) and A := diag (Ai, . . . , A/c). Let 
R = {1as)s=i A; ^ be a matrix whose s — th column is the indicator function 
of As and W G a matrix whose s — th row is a probability measure concentrated 

on As. Note that WR = I. 

Since each eigenvector v is piecewise constant on A^, it has a representation 

V = Ru where u G M/e. The matrix U := (ui, . . . ,u/e) is a square matrix with 

V = RU and, since V has k linearly independent columns, U is regular. Putting 
all together we have 

MV = VA 
MRU = RUA 
WMRU = UA 
WMR = UAU~\ 



Since the i — th row of MR equals the i — th sums of the entries of M within the 
partition, i.e. {MR)- — '^i(^)s-i k ’ probability measures in W 

are arbitrary, but the above right side does not depend on it, we have 

{MR). = {MR), for i, j G Ar, r = l,...,k. 

Thus the quantities fhrs •= ^aeA^ where i e Ar are uniquely defined by 

= w M R = U A U~\ 




370 



Christiane Takacs 



i.e. {fhrs)r s=i k diagonalized with the eigenvalues Ai, . . . , A^. 

”4=” Let u be an eigenvector of {ffirs)rs=i k with eigenvalue A. First we show 
that Ru is an eigenvector of M with eigenvalue A. We have 

{'^rs)r,s=l,...,k ^ = WMRU = AU. 

Since W is arbitrary the rows of the matrix MR are constant in each element of 
the partition and consequently 

MRu = XRii. 



Thus the matrix M has k linearly independent eigenvectors Ru \ , . . . , Ruk which 
are piecewise constant on {Ai, . . . , Ak). 

When we have piecewise constant eigenvectors of D the preceding theorem 
tells us that points are similar if their distances to the sets of the partition indicated 
by piecewise constant eigenvectors are equal. So the mean distances from each point 
to the classes have to be equal to the mean distances between the corresponding 
classes, i.e. 

Ja^\ ^ \Ar\\As\ r^l,...,k 

Note that the mean is taken with respect to the number of points. If the piece- 
wise constant eigenvectors correspond to the minimal eigenvalues, ADist will be 
minimal. 

For A~^D the above theorem tells us that points are similar if their distance- 
ratios are equal, i.e. 



div 



for all i,j G r = 1 , . . . , fc. 

djv 



Note that due to the normalizing property of ^ this requirement is less strict 
than the above. It explains why points in the margin are more likely to be cut 
off by the eigenvectors of D. To optimize NDist we use the eigenvectors of 
corresponding to the smallest eigenvalues. 

Points belong to the same class with respect to piecewise constant eigenvec- 
tors of W if their similarities to the other classes indicated by piecewise constant 
eigenvectors are equal. So the mean similarity from each point to each class has 
to be equal to the mean similarity between the corresponding classes, i.e. 



'^iAs 

1^1 



^ArAs 

l^rl 1^ 



for all i G A^, r = 1, . . . , fc. 



This criterion requires a very high similarity within the classes and thus favors 
small classes. It is also quite likely to cut off marginal points (see [5] for exper- 
iments). To optimize AAss we use the eigenvectors of W corresponding to the 
largest eigenvalues. If for example wa^As = 0 for r 7 ^ s, we obviously have k 
classes, but points within one class are similar with respect to the present cri- 
terion only if they have the same similarity within the class. This strictness is 
weakened in the following NAss setting. 

For points in one class with respect to piecewise constant eigenvectors of 
their similarity-ratios are equal, i.e. 

_ '^jAs i j ^ r = 1, . . . , A;. 

Wiv WjV 

Note that the similarity-ratio equals the transition probability that the 

Markov chain on the W— weighted graph moves from the state i to class Ag. Since 




Partitioning with Piecewise Constant Eigenvectors 



371 



these transition probabilities have to be equal for all points in each class, the 
Markov chain admits a factorization with respect to the partition (Ai , .. .,Ak). A 
factorization is possible if, for example, the state space of the Markov chain consists 
of k non-communicating classes. Then the eigenvalues corresponding to the piece- 
wise constant eigenvectors are all equal to one. In most applications we will find 
only piecewise almost constant eigenvectors with eigenvalues near one indicating 
that the graph corresponds to an almost reducible Markov chain. This situation 
arises when the graph consists of several classes of very similar points, where the 
classes themselves are far apart. Because of perturbation theory these classes are 
found by the eigenvectors of Q.~^W very well (see [1] and [3] for failure estimates 
under certain conditions). To optimize NAss we use (almost) piecewise constant 
eigenvectors of corresponding to large eigenvalues. Since by lemma 4.1 each 

right eigenvector of corresponds to a left eigenvector with the same sign 

structure and the same eigenvalue, which can be interpreted as a (sub) invariant 
signed measure, a large eigenvalue tells us that from the signed measure there is 
only a small part which is cancelled by one step of the Markov chain. Thus the 
Markov chain tends to stay within the sets of the partition. But sometimes the 
change in the subinvariant measure is small compared to the number of transitions 
between the sets, this occurs when the changes themselves cancel out each other. 



5. Oscillations of Eigenvectors 



For certain arrangements of points in space, i.e. points along lines, circles or band 
structures in general, one observes optimizing eigenvectors of changing 

gradually along these figures. This is a consequence of theorem 2.1, i.e. 



1 — NAssx = 



Et<j Wjj [Xj - Xjf 

Ei WiXf 



and NAssx = A for an eigenvector x with eigenvalue A. Since for band structures 
the Wij are large for neighboring points and small otherwise, the non-constant 
optimal vector x changes gradually along the band structure. The next optimal 
vector x' does almost the same but has a phase shift or changes faster and so on. If 
there are parallel structures or concentric circles we also observe piecewise (almost) 
constant eigenvectors. The corresponding eigenvalues depend on the similarity 
between the parallel structures. The latter is a consequence of theorem 4.3, since 
each line is, with the exception of the points near the boundary, an equivalence 
class of the corresponding Markov chain. 

There is a similar argumentation for the eigenvalues of A~^D with the differ- 
ence that in this case the large distances together with big changes of x dominate 
the sum, where the w^s are replaced by d’s. 

The above arguments can be made rigorous for a simplified model without 
parallel structures. 



Lemma 5.1. A matrix M = (mu). . 



n — \i ~ j\} has the eigenvectors 
Ujy with = cos 

with (Vi^)^- = sin 




with mi^jf = mij for \i' — j'\ € {K - j| , 



, where = 0, . . 



n — 1 

2 



where v — 





372 



Christiane Takacs 



The corresponding eigenvalues are 

n— 1 

Aj/ — ^ ^ cos 

i=Q 




Proof: The fc-th line of 



P'0 Pi P2 • • • P2 Pi 

Pi Po Pi ' ‘ • P 2 

P2 Pi **• • 

: ‘ ‘ ‘ . Pi p2 

P 2 * ’ • ’ ‘ • Pi Po Pi 

Pi P 2 • • • P 2 Pi Po 

simplifies to 

MO cos (fc^) + (cos + cos (M^)) 

= cos (k^) Lo + 2 Ei>i cos (i^)) 

= cos (k^) (mil + Er=/ cos (*^)) , 

which proves the assertion concerning the Uj,. The assertion concerning the is 
proved in the same way by use of sint = cos (t — tt/2) . 

If the matrix D (resp. W) is of this form, then so is (resp. 

fi-i/ 2 ^j 7 -i/ 2 ) Note that for all involved matrices we have > 0 and 

in our setting the diagonal elements of A (resp. Q) are all equal to 1/n. Since by 
lemma 4.1 (in the (H, A) version) a right eigenvector x of A~^D arises from the 
eigenvector A^/^x of we have the same eigenpairs for A~^D and for 

A“^/^DA“^/^. The matrix D has the same eigenvectors, too, but the eigenvalues 
are different. The eigenpairs of and coincide, W has the 

same eigenvectors but different eigenvalues. 

A distance matrix D of this form arises when we arrange points at the successive 
vertices of a regular n-gon and label them according to this arrangement. If Wij is 
derived from dij , the matrix W has the same structure. Thus by the above theorem 
in such an arrangement we have oscillating eigenvectors, where the corresponding 
eigenvalues appear twice. 






Partitioning with Piecewise Constant Eigenvectors 



373 



• • • 

• • 



^ • 
• • 
A • 



• ^ 
• • 
• ^ 



• • 

• • • • • • 
• • • • • 

• A • 



Figure 1. Set of points in the plane 





{ 1 , 1 .} 


{ 2 , 0 . 305217 } 


{ 3 , 0 . 305217 } 


0.4 


0.4 


0.4 




0.2 


0.2 


0.2 






10 20 30 40 50 60 


"*'■ 10 ' 2*0 ; 30 -40 ; 50 -60 


10 2 Q 10 40 . 5.0 60 


0.2 


- 0.2 


- 0.2 




0.4 


- 0.4 


- 0.4 






{ 4 , 0 . 23102 } 


{ 5 , 0 . 137017 } 


{ 15 , 0 . 0253075 } 


0.4 


0.4 


0.4 




0.2 


0.2 


0.2 














10 20 30 40 50 60 


10 20 3‘0 4*0 5'0 e’O 


10 20 30 40 50 60 


0.2 


- 0.2 


- 0-2 




0.4 


- 0.4 


- 0.4 





Figure 2. Some eigenvectors and eigenvalues of Cl ^^‘^WCl 



Example 5.2. Oscillating and piecewise constant eigenvectors 
If we want to separate the central points (1-20) of the picture from the outer circles 
( 21 - 40 , 4 I-OO), the first three eigenvectors are of no use, only the fourth eigenvector 
does the correct separation, together with the first and the 15 — th eigenvector we 
have the complete splitting of the concentric circles. 



6. Conclusions 

Piecewise constant eigenvectors of any of the considered matrices are quite well 
suited for partitioning. Eigenvectors of normalized matrices have better interpre- 
tations. Uncritical application of, say sign based, partitioning with the first eigen- 
vectors may produce nonsense in presence of band structures. Whether there are 
situations, where the distance based partition, despite of its disadvantage of a 
non-sparse matrix, is preferable, has to be investigated. 



References 

[ 1 ] Deuflhard P., Huisinga W., Fischer A., Schiitte Ch. (2000), Identification of almost 
invariant aggregates in reversible nearly uncoupled Markov chains, Linear Algebra 
and its Applications 315, 36-59 




374 



Christiane Takacs 



[2] Kannan R., Vempala S., Vetta A. (2000), On Clusterings - Good, Bad and Spectral, 
Proceedings of 41®^ Annual Symposium on Foundations of Computer Science 

[3] Ng J., Jordan M., Weiss Y, (2001), On Spectral Clustering: Analysis and an Algo- 
rithm, Advances in Neural Information Processing Systems 14: Proceedings 

[4] Meila M., Shi J. (2001), A Random Walks View of Spectral Segmentation, Proceed- 
ings of the International Workshop on AI and Statistics 

[5] Shi J. , Malik J. (2000), Normalized Cuts and Image Segmentation, IEEE Transac- 
tions on Pattern Analysis and Machine Intelligence (PAMI) 

[6] Seary A, Richards W. (2003), Spectral Methods for Analyzing and Visualizing Net- 
works: An Introduction. In Dynamic Social Network Modeling and Analysis: Work- 
shop Summary and Papers. 

[7] Weiss Y. (1999), Segmentation using eigenvectors: A unifying view. In International 
Conference on Computer Vision 

Christiane Takacs 

Department of Stochastics, University of Linz, Austria 
christiane.takacs@jku.at 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Yaglom Type Limit Theorem for Branching 
Processes in Random Environment 

Vladimir Vatutin and Elena Dyakonova 



ABSTRACT: Let Zn be the number of particles at moment n in a branching 
process in random environment with iid probability generating functions. Imposing 
Spitzer’s condition on the associated random walk and applying the quenched 
approach we find the asymptotics of the survival probability of the process up to 
moment n as n oo and prove a Yaglom type limit theorem (under appropriate 
scaling) for Zn conditioned on Zn > 0.^ 



1. Introduction and main results 



A branching process in random environment (BPRE) is a natural generalization of 
the simple Galton- Watson process. At each generation the offspring distribution 
is chosen according to a probability distribution determined by the environmental 
sequence, {vTn}. In the present paper we use the so-called quenched approach to 
study the asymptotic behavior of the survival probability up to moment n and the 
size of the population at moment n conditioned on the survival of the process up to 
this moment for a wide class of BPRE’s that includes, in particular, the classical 
critical BPRE’s. Our results generalize and refine the corresponding statements 
established in [7] and [8]. 

Let {Zn, nGNo} be a BPRE in which the states of the environment are 
specified by infinite-dimensional vectors {tt^, n G No} , where 



oo 

TTn = , ttW > Q, 7T ^ = 1, n € Nq = {0,1,2,...}, (1) 



i=0 



and the tuples tt^ are independent and identically distributed. Given the environ- 
ment the evolution of the BPRE is described by the relations 

Zo = 1, E I /o, /l, ..., /n; Zq, Zi, ..., Zn] = {fn 



in which 

oo 

fn{s) = Y.nl:h^ ( 2 ) 

i=0 

is the offspring generating function of particles of the n-th generation. 

Let Xfc ln/;_i(l),7?fc := Ci i^) fc € N = {1,2,...}; 

X = Xi, So := 0, Sn ■■= Xi + ... + X„, n > 1. 

In the sequel {5n}n>o is called the associated random walk. We suppose that 
the associated random walk satisfies Spitzer’s condition: 



^Supported in part by grants RFBR 02-01-00266, RFBR-DFG 02-01-04002 and Russian Scientific 
School 1758.2003.1 




376 



Vladimir Vatutin and Elena Dyakonova 



Condition Al. There exists 0 < p < 1 such that 

- ^ P{Sm >0) ^ p, n oo, 

m=l 

or, what is the same (see [4]) limn-.oo P(5n >0) = p. 

Condition Al*. The distribution of X is absolutely continuous and condition 
Al holds. 

Our next assumption is related with truncated moments of the distribu- 
tions TTn : 

cx) oo 2 

0(a) ^ , a E Nq. 

k=a k=0 

To formulate the restrictions we impose on 0(a) some additional functions 
characterizing {5n}n>o sire needed. 

Let 7 o := 0, 7 j+i := min(n > 7 j : Sn < S^.) and Fq := 0, F^+i := 
min(n > Fj : Sn > Stj), j > 0, be the strict descending and ascending ladder 
epochs of {Sn}n>o- Introduce the functions 

CX) 

V{x) := ^ >-x), x>0, V(0) = 1, K (x) = 0, x < 0, 

j=o 

and 

CXD 

U(x) := ^P(5'rj < x > 0, W (0) = 1, [/(x) = 0, x < 0. 

j=o 

If condition Al holds then Sn is an oscillating random walk. Hence, by 
Lemma 1 in [6] I7(x) and V(x) are harmonic functions, that is, 

EU(x-X) = U(x), EV(x-hX) = V(x), x > 0. (3) 

Condition A2. There are numbers 6 > 0 and a G No such that 

E[(log+ 0(a)) < oo and E[V(X)(log+ 0(a))^+"] < oo, (4) 

E[(log+ 0(a)) < oo and E[C/(-X)(log+ 0(a))^+^] < oo. (5) 

Denote by L = {L} the set of all proper probability laws L(^) of nonnegative 
random variables and let L+ C L be the set of the probability laws in L strictly 
concentrated on (0,oo): P(^ > 0) = 1. Let, further, ^ = {$} and C ^ be the 
metric spaces of the Laplace transforms $(A) = e~^^L(dx), A G [0,oo), of the 

laws from L and L+, respectively, endowed with the metric 

d(^U^ 2 )= sup 1 $i(A)-$2(A)1. 

1<A<2 

Since Laplace transforms are completely determined by their values in any interval 
of the positive half-line, convergence ^ ^ as n oo in metric d is equivalent 
to weak convergence ^ of the respective probability laws. 

Put, finally, 

fk,n (s) := fk{fk+ii- -if n -1 (s))...)), 0<k<n-l, (s) := s, 

fnfiis) fn-l{fn~ 2 i-ifo{s))-)), n>l. 

Throughout the paper the symbols E and P are used to denote the expec- 
tation and probability with respect to the measure over the environment (with 




Yaglom type limit theorem 



377 



some obvious exceptions causing no confusion) while the symbols E^r, Ptt al- 
ways stand for the conditional expectation and probability given the environment 

7T (^1 ) ^2) •*•? •••) ' 

Let r (n) := min{A: E [0, n] : Sj > Sk, j = 0, 1, . . . , n} be the left-most point 
of the interval [0, n] at which the minimal value of 5j, j = 0, 1, n is attained. 

Theorem 1.1. Let conditions Al* and A2 be valid. Then the distributions of the 
random variables 

Co,n := , {Zn > 0) = (1 - /o,„ (0)) , n G No, (6) 

weakly converge as n oo to the distribution of a random variable ( E [0, 1] being 
positive with probability 1. 



Denote 

m; := E, [Zn \Zn>0]= e^"/(l - /o,n(0)), := ZJm: 

and set 



$„(A,7t) = $„(A) 



-XY„ 



Zn>0 



Theorem 1.2. Let conditions Al* and A2 be valid. Then 



n ^ oo, (7) 

where $(A) , A E [0, oo), is the Laplace transform of a probability law belonging 
to with probability 1 and the symbol stands for the weak convergence of 
probability measures on the Borel sets of the metric space 

Corollary 1.3. If under the conditions of Theorem 1.2, the offspring generating 
functions of the individuals of each generation are fractional-linear: 

fn{s) = 1 - , 0 <Pn + qn <1, Pn,qn>0, n £ Uq, 

l-qn l-QnS 

then 

The last result is an analogue of the corresponding Yaglom-type limit theorem 
for the ordinary critical Galton- Watson processes. It seems, however, that ( 8 ) is 
not valid in the general situation (see [ 2 ] for a relevant discussion of this problem). 

Remark 1. Our results generalize those in [7] and [ 8 ] in which much stronger 
conditions on the characteristics of the branching processes are imposed. Moreover, 
here we prove that the limit law corresponding to ^ has no atom at zero with 
probability 1. This problem remained unsolved in the mentioned papers. 

Remark 2. For the first time branching processes in random environment 
meeting conditions Al and A2 were investigated in [ 1 ]. The mentioned paper deals 
with the annealed approach and studies the same problems which we analyze here 
under the quenched approach. 



2. Change of measures 

Let {Sk}k>o be the associated random walk. Denote Mn := maxo<fc<n Ln := 
mino<jfc<n Sk = Sr(n)^ r = min{n > 1 : 5n > 0 }, 7 = min{n > 1 : 5n < 0 } and set 

mn (x) := P (Ln > -x ) , rhn (x) := P (Mn < x) , x > 0. 

In what follows the symbols c,Ci, i = 1,2, .. . denote positive constants which may 
be different in different formulas. 




378 



Vladimir Vatutin and Elena Dyakonova 



Lemma 2.1. (compare, for instance, with [8] or [3]; Section 8.9) Under condi- 
tion A1 there exist functions (n) and I 2 (n) slowly varying at infinity and 
^i(^)^2(^) ~ 7r/sin7rp, n 00 , and absolute constants ci > 0, C 2 > 0 such 
that for all n>l and all x G [0, 00 ) 

m„ (x) < ciV (x) / (n)) , (x) < C 2 U (x) / (n ^?2 («)) • 

In addition, for any fixed x G (0, oo) as n ^ oo 

rUn (x) ~ y (x) / (n (n)) , fhn (x) ~ {/ (x) / (n)) . (9) 

If condition Al* holds then 

P (7 > n) ~ 1/ {n^~^h{n)) , P(F > n) ~ 1/ (n^hin)) . (10) 

Denote by II the set of all infinite-dimensional vectors (1) and by E the 
natural cr-algebra generated by the subsets of 11. In view of (2) the offspring gen- 
erating function fn{s) of particles of the n-th generation may be treated as a ran- 
dom vector TTn on the measurable space (II, E). The joint distribution of the tuple 
/o ( 5 ) , /i (s) , fn-i {s) is specified by the product-measure P’^ = PxPx**-xP 
on the measurable space 

(n^, E^) = (n, E) X (n, E) X . . . X (n, e) . 

The infinite sequence /o(s), /i(s), /n-i(s), ... is a random element specified on 
the measurable space (11^, E^) (where E^ is a completion of the infinite direct 
product E X E • • • to a a-algebra) and distributed according to the measure P^ 
which is obtained by extension of the sequence {P^}n>i to (II^,E^) . Such a 
measure exists and is unique by lonescu Tulcea theorem on extending a measure 
and the existence of a random sequence. If P(7ri^^ = 1) = 0 (and we assume this 
condition throughout the paper), then Xn+i := log = ^og/^ (1) ,n G 

No, are well-defined random variables on the probability space (IT^, E^,P^) . 

Let {fn}n>o {fn}n>o independent sequences (realizations) of 

the random environment and let and {*5^}^>o be the corresponding 

associated random walks. Later on any characteristics or random variables related 
with {/“ and {/n }n>o ’ supplied with the symbols - or +, respectively. 

For instance, we write = mino<j<n5'^, F“ = min{n > 1 : S'" > 0} and 
7 “^ = min{n > 1 : S^ < 0}. Set Ak^p = {T~ > fc, 7 ”^ > p} . For any bounded and 
(n^ X n^, E^ X E^) — measurable function -0 : 11^ x 11^ R let 

E (/o-,...,/fc-_i;/o+,...,/pti) U {-S^) V{S+)I{Ak,p}] . (11) 

For S^ < 0 and > 0 we have 

E [U I 5,-] = [/ i-s ;) , E 7(570 I 57 = 1 /( 5 ;). 

Hence it follows that the definition of E[#] is consistent for all k and p. Moreover, 
a = -Ip- (/o",...,/ 0 -i)(or tp = i-e-, if only the first k (the 

last p) variables of 0 are essential then by (3) and independency of the sequences 

n ln >0 and {/+ }„>o 

E[r(/o“, 

= E [r (/o-, 4 -- 1 ) U (-5,-) / {r- > A:}] 




Yaglom type limit theorem 



379 



E-[r{fo,-Jk-i)], ( 12 ) 

where I {A} is the indicator of the event A, and 

E[V>+(/o+, f+_,)U{-Sj;)V{S;)I{Ak,p}] 

= E[t/>+(/o+,...,/;_ 0 n 5 +)/{ 7 +>p}] 

=: E+[V>+(/o+,...,/;_0]- (13) 

Relations (12), (13) and (11) specify in a natural way measures on 
(n^,E°°) and P = P“ x P"^ on (11^ x x E^). These measures are 

defined for any Borel sets Aq, Bq? •••? Bp_i c n by taking 

V’" ifo^-Jk-i) &Ai,i = 0,...,k-l), 

V^+ (/o+,...,/+_i) :=/(// e Bj,j = 0,...,p- 1) 
and ^ := 'ip~ X -0+ in (12), (13) and (11), respectively. In turn, P“ and P"^ induce 
in a natural way the corresponding measures on {*5^}fc>o and {S^}k>o for which 
we keep the same notation P“ and P“^. In addition, we use the symbols and 
to denote the laws generated by the measures P“ and P+. The next lemma 
shows the importance of P“, P"*" and P. 



Lemma 2.2. If condition A1 holds then for any hounded and (ll^ x 11^, E^ x E^) — 
measurable function 'll; : x IP H 



lim E [V> (/o , fk-v fo > •••> fp-i) I Ar, 

min(n,r)— voo 



= E[^(/o-,...,4-_i;/o+,...,/+i)]. (14) 



Proof. For k <n and p <r we have 
E[V'(/o-,...,4-_i;/o+,...,/;_l) Mn,r] 

^ E [ip (fo , fk-iJo . fp-i) I {Ak,p} m~_fc (~5fc ) mt-p {S + )] 
P(r- > n)P(7+ > r) 

It follows from Lemma 2.1 that for any fixed k,m and x > 0 



(15) 



lim 



m. 



n—k 



(^) 



= U (x) , lim 



(^) 



V{x). 



-oo P(r- > n) ' ' ’ r-oo P(7+ > r) 

Moreover, there exists a constant C > 0 such that for all a: > 0 and all n > A; 
and r >p 



^n-k (^) 

P(r- > n) 



< CU{x) 



and 



jx) 

P( 7 + > r) 



<CV{x). 



Hence using the dominated convergence theorem and passing to the limit in (15) 
as min (n, r) oo we get (14). 

The lemma is proved. 



It will be convenient for us to use Lemma 2.2 in a wider context. To this aim 
we need the following generalization of Lemma 2.5 in [1]. 

Let Tri be the cr- algebra generated by the random variables /o, /i, . . . , fn-i] 
Zq = 1, Zi, . . , , Zn and T = completion of Vn to a a-algebra. The 

(j-algebras and T+ are defined in a similar way. 




380 



Vladimir Vatutin and Elena Dyakonova 



Lemma 2.3. Let condition AV be valid and let Ti^p, l,p E N be a tuple of uniformly 
bounded random variables such that Ti^p is measurable with respect to the a -algebra 
X for any pair l,p. Then 

lim E[T;,p| = E[T;,p]. (16) 

min(n,r)— >oo 

More generally, if the array n,r G N} consists of uniformly bounded random 

variables adapted to the filtration !T~ x and limjnin(n,r)->oo '^n,r =• T exists P— 
a.s., then 

lim E[T„,,|^„,,]=E[T]. (17) 

min(n,r)— >oo 



Proof. Relation (16) can be proved the same way as Lemma 2.2. To demon- 
strate (17) observe that for any numbers a > 1 and I < min(n,r), I G N 



|E[Tn,r ~Tlj\ Acrn,ar]\ ^ E 



\Tn,r - Tij 



m, 



(cr— l)n 



i-S-)m+{S+) 



rri(jfi (O) rri(jf (0) 

^ “ Tlj\U{-S-)V{S+)I {An,r}] 






ca 

cr — 1 






Hence by the conditions of the lemma and the dominated convergence theorem we 
get 

lim sup lim sup |E[Tn,r - ^an,(jr]l = 0. 

l—^oo min(n,r)— ^oo 

In particular, 

E[Tr,,rI{A^n,ar}] = (E[T] + o{l))P{A^n,.r). 

Consequently, 

\E[Tn,rI{An,r}] ~ E[T]P(^„,,)| 

^ \^[Tn,r^ {A(rn,(Tr}] ~ E[T]P(A^^^^7^) | -|- ~ P(^n,r)| 

< (^o(l) + c((l - + (1 - <^^“'’)))p(^n,r), 

since by Lemma 2.1 

I F{A^n,ar) ~ P(^n,r) | = | P(r" > m7)P(7+ > ar) - P(T- > n)P( 7 + > r) \ 

< I P(r~ > ncr) - P(r~ > n) | P( 7 “^ > err) 

-h I P( 7 "^ > r) — P( 7 ”^ > ar) | P(F“ > n) 

< c((l - a-^)a^-^ + (1 - a^-^))P(An,,). 

Therefore, 

B[Tn,rI{AnA] ~ M^] P(^n,r) = o(P(^n,r)), 
as min(Ti,r) ^ oo which is equivalent to (17). 

Remark 3. It follows easily from the proof of Lemma 2.3 that under condi- 
tion AV 

limsup limsup |E[Tn,r — Ti^p | An^r]\ = 0. (18) 

min(Z,p)— )-oo min(n,r)— >oo 




Yaglom type limit theorem 



381 



In the subsequent arguments we essentially use the following known results. 

Lemma 2.4. (^ee, for instance, [5]). Let /j / 1,0 < j < n — 1. Then for any 
0 < s < 1 and 0 < m < n — 1 



1 /m,n (^) 1 5 



+ ^ Vj,n {s) ( 



/" (1) 

'nj,n (s) := Xj ilj+l,n (s)) < Vj+1 = ^ 2 - 

[Jj V-’-jj 

If, in addition, fn{s) are fractional-linear for all n = 0,1, , then 
'Hj^n (s) = r/j+i/2, j = 0, 1, . . . , n - 1. 
Lemma 2.5. [1] If conditions A1 and (i) are valid, then 

oo 

< 00 P“^ — a.s. 

j=o 

If conditions A1 and (5) are valid, then 

oo 

< oo P“ — a.s. 



3. Proofs of the main results 

Proof of Theorem 1.1. Clearly, to demonstrate the theorem it suffices to establish 
the existence of a random variable C such that 

P(C G (0, 1]) = 1 and hm E[^(Co,n)] = E[^(C)] 

n — >^oo 

for any function g{x) that is bounded and uniformly continuous in x G R. 

We have 

n 

E[p(Co,n)] = Y. E[ff(Co,n) I T{n) = fc]P(r(n) = k). (21) 

k=0 

It is known ([9], Ch. IV, § 20) that under condition A1 

n~^r(n)^T, n oo ( 22 ) 

(here the symbol ^ means convergence in distribution), where r is a random 
variable having the generalized arc-sine distribution with parameter p. The limiting 
distribution is absolutely continuous on (0,1) and has no atoms at points 0 and 1. 
Thus, for any s > 0 there exists 5 G (0, 1/2) such that ’P{r{n) ^ [nS, n(l — 5)]) < e 
for all sufficiently large n. Hence, to establish the desired result it suffices to show 
the existence of a random variable ( such that 

• E[5(Co,n) I T(n) = k] = E[ff(C)] 

min(Ac,n— Acj — >^00 



( 23 ) 




382 



Vladimir Vatutin and Elena Dyakonova 



and then to apply the dominated convergence theorem to ( 21 ). 

We proceed to prove (21). Rewriting Co,n as 

Co,n = (1 - /o,r(n)(/T(n),n(0)))e“^"("), 

we get 

Eb(Co,n) I T{n) = fc] = E[g((l - /o,fe(/fc,n(0)))e“‘^'‘) | T(n) - A:] 

= I 

where Tk,n-k {/" }„>q and {/+ }„>q are inde- 

pendent sequences of offspring generating functions with 5^ = Yl'jZo lng(/j^)^(l)- 
Set Cfc (s) := (1 - and 77 ^ 0 ( 5 ) := Xfe-i(/fc"-i.o(s))> « 6 [0,1), 

fc G N. Since the limit lim^^oo Ck (^) C (^) exists 

P- a.s. By Lemmas 2.4, 2.5 and relations 

1 



e^k 



—V- = lim — = lim — 

C (S) fe-oo (s) k^oo 1 - q(s) 



= <00 P" - a.s. (24) 









the limiting function ( (s) is P— a.s. uniformly continuous in s from any interval 
contained in [0, 1). For the same reason linin-k^oo f^n-kW •“ exists P— a.s. 

and, moreover, P(^"^ < 1 ) = 1 , since tends to infinity P“^— a.s. as i ^ oo and, 
therefore. 



i — l 



= lim 



l-q+ i-^oo 1 - /g+ (0) i 



lim fe +^r/ 7 i( 0 )e 

l—^OO \ / 






-5+ 



< 5 Z»?/+ie <oo 

7=0 



— a.s. 



(25) 



Hence it follows that C ;= limmin(fc,n-fc)^oo Cfe (/o)n-fe(O)) = C { q '^) exists P- a.s. 
and P(C G (0, 1]) = 1. It is clear that C is a random variable possessing all the prop- 
erties claimed in Theorem 1.1 and, in addition, limmin(/c,n-fc)-^cx) ^/c,n-fc = 9{C) 
P- a.s. Applying now Lemma 2.3 we obtain 

lim E[Tk^n-k I Ak,n-k] = E[5'(C)], 

min(fc,n— fc)— >oo 



proving (23) and, as a result. Theorem 1.1. 
Proof of Theorem 1.2. Evidently, 



$n(A) = E,, 




Zn>0 



1 - /o,n(e-^/<) 
1 - /o,n(0) 



with 1/m* = (1 — fo,n(0))e~^"- = and, therefore, for any function 

G : $ ^ R bounded and uniformly continuous on ^ with respect to metric d and 
any integer fc G [0, n] 

E[G($„(*)) |r(n) = fc] = E[G(i?fc,„_fe(.)) | Ak,n-k], 




Yaglom type limit theorem 



383 



where 



Rk,n—k{^) 



l-/M(/otn-fc(0)) 

Cfc~ (/otn-fc(exp{-ACfc (/otn-fc(0))e~^"~'‘})) 

C(/otn-fc(0)) 



Thus, as in Theorem 1.1 it suffices to show that 

lim $(A), A > 0, 

min{k,n—k)—*oo 



(26) 

(27) 



exists P— a.s., since (27) implies, evidently, the respective P— a.s. convergence 
of the Laplace transforms Rk^n-k{^) to $(•) as elements of Recalling The- 
orem 1.1 we conclude that it is enough to establish the existence of the limit 

limmi„(fc,n-fe)-ooC^(/ot„-fc(exp{-ACft (/o+„_fc(0))e“‘®^fc})) P- a.s. In order to 
demonstrate this fact consider first the random function 

Ok,n-kW ■= Cfe (/otn-fe(exp{-Ae“^--^})). 

Let i G No be a branching process generated by the sequence . 

and let = H (Z^, Z^, ..., fi^i ) the minimal cr-algebra gener- 

ated by Z+,Z+,...,Z+ given f+ , f+ , f+v Since E,+ [Z+, \Zf] = 
it follows that the stochastic sequence (^Zf j 1 € N is a nonnegative 

martingale. We know that 

E,r+ [exp{-AZt }] /+. (^exp |-Ae“^«^ , A € [0, oo). 

Hence, for any A > 0 

(exp|-Ae-®*^|j =: Q+ (A) = Q+ (A,7t+) 
exists P— a.s. Moreover, P(Q“^(0) = 1) = 1 in view of 



F^+{Z'^e >x)<x ^E^r+iZ^^e ^ 0, x ^ oo, 

uniformly in z G N. Observe also that P((5“^(A) < 1) = 1 for any A > 0, since S'^ 
tends to infinity P"^— a.s. as j ^ cx) and, as a result. 



lim 



1 - g+(A) A- ^ {_Ae-^,^ } ) 

lim ( 

i—^oo \ 2 — 



exp 






-s+ 


i-1 

^ 


{-Ae-V) 


■ j=0 


— S'"*” 

< oo 


P+ - a.s. 



j=o 



Thus, for any A > 0 

lim 0fc,„_fe(A)=C(Q+(A)) P+- 

mm{k,n—k)^oo 



a.s. 




384 



Vladimir Vatutin and Elena Dyakonova 



The function Q^(A) = Q"^(A,7 t), being the Laplace transform of a nonnegative 
nondegenerate random variable, is uniformly continuous in A G [0, oo). Hence we 
deduce by Theorem 1.1 that 

$(A) := lim Rk,n-k{X) 

mm{k,n-k)-^oo 

_ ^ l™min(/c,n-fc)-+co C/c (/i^n-^ "")) 

ll^^min(fc,n-fc)— ^oo Cfc ifo,n-k(^)) 

exists P— a.s. . Moreover, this convergence is P— a.s. uniform in A G [1, 2]. Thus, 

if min(fc, n — k) oo then Rk^n-k{*) converge P— a.s. to $(•) as elements of 4^. 
Consequently, for any function G : $ ^ R bounded and uniformly continuous on 
^ with respect to metric d we have 

. lim E[G($„C)) I r{n) = fc] = E[G($(.))]. (29) 

min(/c,n— Ac)— >oo 

Hence, in view of (22) we get 

n 

lim E[G($„(.))] = lim Ve[G(^„(.)) | r(n) = A:]P(r(n) = k)= E[G($(*))]. 

n — >■00 n — >00 ' 

Thus, ^ ^ 00 . 

It remains to check that the probability law with Laplace transform $(A) 
has no atom at zero P— a.s. and, therefore, belongs to L+ with probability 1. 
Clearly, it suffices to prove that limA-^oo^(A) = 0 P— a.s. To this aim let 
= Ymii^oo shown in the proof of Proposition 3.1 in [1] that 

{W^ > 0} = {Z+ > 0 for all i} P+- a.s. Thus, 

q+ = P^+( lim =Q) = P^+(IT+ = 0) - lim Q+(A) P+ - a.s. 

2 — >00 A— >CXD 

From this equality. Theorem 1.1, the continuity of C~(^) in s G [0, 1) and (28) it 
is easy to deduce that 

lim $(A) = 1 - = 0 P - a.s.. 

A^oo Q 

This completes the proof of Theorem 1.2. 



Proof of Corollary 1.3. The proof of this corollary repeats almost literally the 
respective arguments used in [7] to establish (8) under stronger conditions and we 
give the needed arguments for completeness only. 

It follows from (24) and Lemmas 2.4 and 2.5 that 



r (s) = 





P — a.s. 



In particular, 

< = c- (,-) = ( + r) . c- {Q* m = (rr^ + r) 




Yaglom type limit theorem 



385 



Since ^ oo P-a.s, as j oo, applying Lemma 2.4 to the fractional-linear 
offspring functions gives 



^ \ j=0 / j=0 



P - a.s. 



and 



lim 



— S'”*” 

e 



. k-l 

i _l_ 5+ 

1 - Q+ (AC) min(fc,n-fc)-.oo ^ 1 _ gxp |-ACo,ne“^^ 



1 1^ , c+ 1 1 - 

^ = — + ^ r p - a.s. 



AC 1-9+ 

J —^ 

Hence we obtain 
as required. 



-1 



1 + A 



References 

[1] Afanasyev V.L, Geiger J., Kersting G., and Vatutin V.A. Criticality for branching 
processes in random environment submitted in Annals of Probability, 2003. 

[2] Athreya K.B. and Karlin S. Branching processes with random environments, II: limit 
theorems. Ann. Math. Statist., 42:1843-1858, 1971. 

[3] Bingham N.H., Goldie C.M., and Teugels J.L. Regular variation. Cambridge Univer- 
sity Press, Cambridge, 1987. 

[4] Doney R.A. Spitzer’s condition and the ladder variables in random walks. Probab. 
Theory Relat. Fields 101: 577-580, 1995. 

[5] Geiger J. and Kersting G. The survival probability of a critical branching process in 
random environment. Theory Probab. Appl., 45:607-615, 2000. 

[6] Hirano K. Determination of the limiting coefficient for exponential functionals of ran- 
dom walks with positive drift. J. Math. Sci. Univ. Tokyo, 5:299-332, 1998. 

[7] Vatutin V.A. and Dyakonova E.E. Reduced branching processes in random environ- 
ment. In: Mathematics and Computer Science II: Algorithms, Trees, Combinatorics 
and Probabilities (Ed. B.Chauvin, P.Flajolet, D.Gardy, A.Mokkadem), Basel - Boston 
- Berlin: Birkhauser, 455-467, 2002. 

[8] Vatutin V.A. and Dyakonova E.E. Galton-Watson branching processes in random 
environment, I: limit theorems. Theory Probab. Appl., 48:274-300, 2003. (In Russian.) 

[9] Spitzer F. Principles of random walk. Princeton NJ, Toronto - New York - London, 
1964. 

Vladimir Vatutin and Elena Dyakonova 

Steklov Mathematical Institute 

Gubkin street, 8 

117966, Moscow 

Russia 

e-mail: vatutin@mi.ras.ru, elena@mi.ras.ru 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Two-Dimensional Limit Theorem for a Critical 
Catalytic Branching Random Walk 

Valentin Topchii and Vladimir Vatutin 



ABSTRACT: A continuous time branching random walk on the lattice Z is 
considered in which individuals may produce children at the origin only. Assuming 
that the underlying random walk is symmetric and the offspring reproduction law 
is critical we study the asymptotic behavior of the joint distribution of the number 
of individuals at the origin and outside the origin at moment t as t ^ oo given 
that there are individuals at the origin at this moment 

1. Introduction 

In [1] and [2] we investigated the following modification of a standard branching 
random walk on Z. The population is initiated at time t = 0 by a single particle. 
Being outside the origin the particle performs similar a continuous time underlying 
random walk on Z with infinitesimal transition matrix 

A = \a{x,y%^y^z, a(0,0) < 0, 

until the moment when it hits the origin. At the origin it spends an exponentially 
distributed time with parameter 1 and then either jumps to a point y ^ 0 with 
probability — (1— o;)a(0, 0) or dies with probability a producing just before 

the death a random number of children ^ in accordance with offspring generating 
function 

oo 

f{s)'^ = ( 1 ) 

At the birth moment the newborn particles are located at the origin and from 
this point they begin their own branching random walks behaving independently 
and stochastically the same as the parent individual. 

We assume that the process is initiated at time t = 0 by a single individual 
located at the origin and impose the following restrictions on the characteristics 
of the process: 

Hypothesis (I): The underlying random walk on Z is symmetric, irreducible 

def 

and homogeneous {a{x,y) = a(0,y — x) = a[y — x)) with a{x) > 0 if x ^ 0, 

a(0) < 0, 

^a(x) = 0, and ^x^a(x) < oo. 

Hypothesis (II): The offspring process is critical (/'(I) = 1) and 
/"(l)G(0,oo). 

^The first author is supported in part by the grants RFBR 03-01-00045, Russian Scientific School 
2139.2003.1; the second author is supported in part by the grants RFBR 02-01-00266, Russian 
Scientihc School 1758.2003.1, and INTAS 03-51-5018 




388 



Valentin Topchii and Vladimir Vatutin 



^2 < oo and ^ ( 0 , oo). 

x€Z 

Let C(^) denote the number of particles in the process located at time t at 
the origin, ju(t) denote the number of particles in the process at time t outside 
the origin, and let rj{t) = C(^) + number of individuals at the 

process at moment t. The aim of the present paper is to establish a conditional 
limit theorem for a suitable normalized vector (C( 0 ? /^(^)) given that C{t) > 0 . 

The model we investigate here was considered in a wider context in [3]- [ 6 ]. In 
the mentioned papers integral equations were deduced (in a more general situation 
of random walks on the lattice Z^, d > 1 ) for the generating functions of 
the number of particles located at moment t at point x G Z^, and the total 
size of the population at time t, and the asymptotic behavior of the moments of 
these random variables was investigated. The present paper, being a continuation 
of papers [ 1 ] and [ 2 ], gives for the case d = 1 more detailed information about the 
properties of the process under study. 

Put 



^ def 2 ^/^ 



an 



1/4 



6(1 — a) 



and Ca nK^ = 



2v^6(l - a) 



a 



In the sequel we need the following results established in [1] and [2]. 



Theorem 1.1. [1] Let hypotheses (I) and (II) he valid. Then, as t ^ oo, 
Q{t) P(mW > 0) ~ P(t7(0 > 0) ~ 

q{t) P(C(i)>0)~-|^, 

^/t hit 



and for any s e [0, 1] 



lim E 

t— >oo 



s^^^^ I rj{t) > 0 



= lim E 

t— >-oo 



>0 



= 1 - Vl-s. 



Theorem 1.2. [2] Let hypotheses (I) and (II) be valid. Then for any A € [0, oo) 



lim E < exp 

t^oo 1 



AC(i) 



E{at)\m>o}. 



m > 0 u 



A + 1 



Theorems 1.1 and 1.2 provide no information about the number of individu- 
als outside the origin given that there are individuals at the origin. In the present 
paper we fill this gap and prove a theorem describing, in particular, the asymp- 
totic behavior of ju(t) given that (^{t) > 0. To formulate our main result we need 
additional notation. Let v^(Ai) be a continuous positive solution of the equation 

(^(Ai) = 1 - aa‘^Xy^ ( - v))dv^^'^, Ai > 0 (2) 

Jo 

(according to Lemma 7 in [9] such a solution exists and is unique). Let, further, 
D{X 2 ) be a bounded solution of the equation 

A 1 2x ^ X /OA 

D{X 2 ) = l-aa X 2 / 7 — — dy, X 2 > 0, (3) 

Jo V 2 /(l- 2 /) 

which is unique and positive as is shown later in the paper. 




Catalytic branching walk 



389 



Theorem 1.3. Let hypotheses (I) and (II) be valid and, in addition, 
h"(x) = o(| ln“^ x\) as x +0. 

Then for all Ai , A 2 > 0 



lim E 

t—^00 




AiC(0 

E{C{t)\C{t)>0} 





D{\2) 

1 + Ai’ 



where ^ by/^{l - a). 



Thus, under our scaling the numbers of individuals at the origin and outside 
the origin are independent at that moments (rather rare as follows from Theorem 
1.1) when (^{t) > 0. 

Before passing to the proof of Theorem 1.3 we say a few words about the 
asymptotic behavior of the scaling function E{C(t) \C{t) > 0}. To this aim we 
temporarily forget that our random walk has a point of catalyst and consider an 
ordinary random walk on Z satisfying Hypothesis (I). Assume that the random 
walk starts at the origin at time t = 0 and let ri be the time spent by the 
walking particle at the origin until it leaves the origin and let T 2 be the time 
spent by this particle outside the origin until the first return to the origin. Set 

Gi{t) P(n <t) = l- e-\ G 2 {t) P(t 2 < t) and 

Gsit) — aGi{t) + (1 — oi)Gi * G2{t), (4) 

def 

where * is the convolution symbol. It is shown in [1] that the function P(t) = 
E((t) has the representation 

P{t) = {l-G^{-))*U{t) (5) 

where 

oo 

( 6 ) 

k=0 

with G"^^{t) = 1 and k > 1, the A;-th convolution of Gs{t) with itself. 

Moreover, the following statement is valid: 



Lemma 1.4. [1] P{t) is monotone decreasing and P{t) ~ Cpt as t 
Combining this statement with Theorem 1.1 we see that 

mt) p{t) 



oo. 



E{C{t)\C{t)>0} = 



P{C(i) > 0} q{t) 



c* Int, t 00 , 



where c* CpC~^ — acr^(47r)~^6”^(l— a)“^. As a result we get that given ({t) > 0 
the number of individuals at the origin is order \nt (up to a random multiplier) 
while the number of particles outside the origin is of order \/i. 



2. Branching random walk and Bellman-Harris 
processes 

Similarly to [1] and [2] we prove Theorem 1.3 by introducing an auxiliary Bellman- 
Harris branching process (Zi{t), Z 2 {t)) with two types of particles (see [7, 8]), 
where by Z^(t), i = 1,2 we denote the number of individuals of type i in this 
process at time t. 




390 



Valentin Topchii and Vladimir Vatutin 



Let 

Fi{f,si,S 2 ) = E {sf = 1 } , i = 1,2, 

be the probability generating function of the number of individuals of both types 
given that the process is initiated at time zero by a single individual of type i. 

The critical Bellman-Harris process with two types of individuals we are 
interested in is described as follows. A particle of the first type has life length 
distribution Gi{t) = P(ti < t) = 1 — t > 0, and dying produces offspring of 
two types in accordance with probability generating function /i(si, S 2 ) = o;/(si) + 
(1— a)s 2 , that is it produces with probability afk exactly fc > 0 particles of the first 
type and with probability 1 — a exactly one particle of the second type (recall the 
definition of f{s) in (1)). The life length distribution of a particle of the second type 
is G 2 {t) = P(t 2 < t) (that is coincides in distribution with the time spend outside 
the origin by the parent individual of the catalytic branching random walk under 
investigation until the first return to the origin provided that the initial individual 
is located at point 0 at time ^ = 0 and it does not produce children during its first 
stay at 0). Dying a particle of the second type produces offspring in accordance 
with probability generating function / 2 (si,S 2 ) = si, that is, it produces exactly 
one particle of the first type and nothing else. 

This Bellman-Harris process is critical and indecomposable since the maximal 
in absolute value eigenvalue (the Perron root) of its mean matrix 












1-a 

0 



equals 1 and all the elements of are positive. 
It is not difficult to understand that 






(7) 



Recall that under our assumptions on the form of /i(si, S 2 )> * = 1, 2, we have 
(see [8], Chapter VIII, §1, or [7]) 

•P’i(i;si,S2) = si(l-Gi(t)) 

+ / {af{Fi{t-u-,si,S2)) + {l-a)F2(t-u;si,S2))dGi{u), 

Jo 

F2{t;si,S2) = S2{l-G2it))+ [ Fi{t-u;si,S2)dG2{u). 

Jo 



def 



Using the second of these equalities in the first and introducing Q(^; si , § 2 ) 

1 — Fi(t] 81 , 82 ) we see that 

Q{t] 81 , 82 ) = (1 — si){l — Gi(^)) + (1 — S2){1 — Oi){l — G 2 {')) * Gi{t) 

+ / Q{t - u', 8 i, 82 )dG 3 {u) - a h{Q{t - u-, 8 i,S 2 ))dGi{u) (8) 

Jo Jo 



where h{x) /(I — x) — {1 — x) and Gs{t) is from (4). 

Solving renewal equation (8) we obtain by (5) and (6) 

Q{P, 8 ,, 82 ) = (l-5i)P(t) + (l-a)(l-S2)(l-G2(-))*Gi*C7(0 




-U] 8 i, 82 ))d{Gi ^U{u)). 



(9) 




Catalytic branching walk 



391 



Note that 

(1 - a)(l - G2(-)) * Grit) = (1 - a)Gi(t) - (1 - a)G2 * Gi(t) = Gr[t) - Gsit) 
and, consequently, 

(l-a)(l - G2(-)) * Gi * U(t) = Gi * U(t) - Gs * U(t) 

= Gl {t) + Gi * U{t) - U{t) = 1 - (1 - Gi(.)) * U(t) 

= 1-m 

Since G\{t) = 1 - e~^ it follows that (Gi * ?7(t))' = (1 - Gi(-)) * i7(t) = P{t). 
Using the identities above we rewrite (9) as follows 

Si,S2) = (1 - Si)P{t) + (1 - P{t)){l - S2) 

—a h{Q{t - u]Si,S2))P{u)du. (10) 

Jo 



Put Q(t;s,l) q{t]s). Observe that q{t) = P{C{t) > 0) = Setting 

si = s and S 2 = 1 in (10) leads to the equality 

q(^t] s) = {1 — s)P{t) — a f h{q{t — u; s))P{u)du. 

Jo 

Denote 



si{t) = si{t;Xi) = 



^ f Ai 

P(t) f E{C(t)|C(i)>0} 



S 2 {t) = S 2 {t] X 2 ) = exp 




A(t;Ai,A2) = ^(^; 0 ) - q{t;si{t]Xi)) 

+ {Q{t] si{t] Ai), S2(t; A2)) - Q{t; 0 , S2{t] X2))). 
It is not difficult to check that 

i?t(Ai, A 2 ) E [sf (t) sf it) IZr (t) > ol 



F(t; si(t), S 2 (t)) - F(t; 0, S 2 (t)) 
P(Zi(t)>0) 

Q(t;0,S2(t)) - Q(t; sijt), S 2 (t)) 

<j(t) 



— 0t(Ai) — 4't(Ai, A 2 ), (11) 



CM q(t) -g(t;si(t)) ^ , def A(t;Ai,A2) 

= W) ’ = g(t) ■ 



By Theorem 1.2 and (7) 



lim 0t(Ai) = 7 . 

t — ^OO 1 A][ 



Hence, to prove Theorem 1.3 it is necessary to find limt_,oo (Ai, A 2 ). To solve 
this problem we need two following basic representations which can be deduced 
from equation (10) by rather tedious and complicated arguments too lengthy to 
be given in the present paper: 




392 



Valentin Topchii and Vladimir Vatutin 



Lemma 2.1. Under the conditions of Theorem 1.3 

A2 r <p(xh) 



^((Ai.Aq) = aa ^ , f 
^-r M Jo 



\/yO-y) 



dy 



— aa^X‘ 



r 



^fy(Al,A2y/y)y(Aij/) 

\/y(i -2/) 



<^2/ + ?’o,e(i;Ai,A2), 



where 



lim lim sup |t’o,£(^; A i, A 2 )| = 0 

£-^+0t-^CX) (Ai,A2)GI> 



for any bounded set D C [0, oo) x [0,oo), or, in view of (11) and (12), 






where 



1 + Ai 



acF^\2 



r 



Hty{XuX2^)^{Xly) 

\/y(i - y) 



rfj/ + re(i;Ai,A2), (13) 



lim lim sup Ai, A 2 )l = 0. 

e_+0t-^oo (Ai,A2)6'D 



Our next goal is to show that Ht (Ai, A 2 ) D (A 2 ) (1+Ai)”^ ast 00 where 
D{\ 2 ) solves equation (3). However, first we need to establish that a solution of 
equation (3) exists and is unique. 

The uniqueness problem is partially solved by means of the following two 
lemmas the first of which can be demonstrated by standard methods and we omit 
its proof. 



Lemma 2.2. In the domain A 2 > 0 there are no bounded solutions of the equation 

^{X2) = aa‘^X2 f ip{X2y/y)-j^^=dy 

Jo y/y{l-y) 

other than 'ip{\ 2 ) = 0- 

Lemma 2.3. Equation (3) has at most one solution on the half-line X 2 > 0. More- 
over, in the range X 2 G [0, 2p), where p = (2a7ra^) there exists exactly one 
bounded positive solution o/(3). 

Sketch of the proof. The first part of Lemma 2.3 is a corollary of Lemma 2.2. 
To demonstrate the second part we let C[0, 2p) be the set of continuous functions 
on [0, 2p) and for functions T{^) G (7[0, 2p) define an operator L : C[0,2p) 
C[0,2p) by the equality 

tT(A,) « 1 - /' 

Jo ^yy{l-y) 

Setting T^(A 2 ) = 1, defining T"+^(A 2 ) = £T"(A 2 ), n G N, and recalling that 
0 < ip(x) < 1 if a; > 0 by (2), it is not difficult to check by induction that 
for A 2 G [0, 2p) 

0 < T"(A2) < 1, |T”+1(A2) - T"(A2)| < (A 2 / 2 P)”. 

Hence the second part of the lemma follows. 



Lemma 2.4. Under the conditions of Theorem 1.3 for all Ai > 0 and X 2 G [0, p] 

limFt(Ai,A2) = ^|^. 

t — +00 1 -f- Ai 




Catalytic branching walk 



393 



Proof. It is easy to see that in the range of Ai and A 2 we are interested in 
-H" (Ai, A 2 ) D{\ 2){1 -h Ai)“^ satisfies the relation 



H(\ \ \ ^ nrr2\ , 



(14) 



Moreover, using Lemma 2.3 one can show that there is no H (Ai, A 2 ) other than 
JD(A2)(1 + Ai)”^ which solves equation (14) for (Ai, A 2 ) E [0, 00 ) x [0,/?]. By (13) 
it is not difficult to deduce that for any 6 G (0, 1 /2) 

nl-£ 

\Ht{XuX2)-H{Xi,\2)\<aa^\2 I \HtyiXuX 2 Vy) ~ H (Xi,X2^)\x 

vi>4y) 



Vvi^-y) 



dy + re{t-,Xi,X2), 



where 



lim lim sup |r ^(t; Ai, A 2 )| = 0 

e-^+O t^oo (Ai,A2)G‘D 



for any D C [0, oo) x [0,p]. Setting 

Mt{Xi;X 2 )= sup \Ht{Xi,z) - H {Xi,z)\, Mr (Ai;p) = supM( (Ai;p) 

0<z<\2 t>T 

it is easy to see that for all sufficiently large t and all Ai > 0 and A 2 G [0,p] 

\Ht (Xi, X 2 ) - H {Xi,X 2 )\ < g {e) + 2~^ sup Mty {Xi; p) 

y>£ 

= ^ (^) + 2 (Ai; p) , 

where g{s) is a function in e such that limgjo p(^) = 0- Therefore, as t> to{e) 

Mt (Ai; A 2 ) < g{e) -\-2 ^Mte (Ai; p) . 

Hence 

(Ai; p) < g (s) 2“^Mt£ (Ai; p) 

for all sufficiently large T. Setting M (Ai; p) = limT->oo (Ai; p) we get 
^ (Ai; p) < p (e) + 2 (Ai; p) . 

Since e > 0 may be selected arbitrary small, it follows that iVt (Ai; p) =0 proving 
the lemma. 



3. Proof of Theorem 1.3 



Prom Lemma 2.4, relations (12), (11), (7), and the uniqueness theorem for Laplace 
transforms we conclude that there exists a function IP(Ai,A 2 ) analytic in the 
domain OleXi > 0, 3ieA2 > 0 such that 



lim E 

t^oo 




AiC(t) 

E{C(i) K(f) > 0} 





for all Ai, A 2 : IReAi > 0, lReA 2 > 0. In particular, 



W(Ai,A2) 



W^(Ai,A2) 



i^(A2) 

1 + Ai 



for Ai, A 2 e [0,oo) X [0,p] . 




394 



Valentin Topchii and Vladimir Vatutin 



Letting Ai = 0 we see that there exists a unique analytic extension of D{\ 2 ) to 
the domain ^ReA 2 > 0. We keep the same symbol D{\ 2 ) for this extension. Clearly, 
D{\ 2 ) is the Laplace transform of a non-degenerate distribution on the positive 
half- line and, in particular. 



TV(Ai,A2) 



\imHt{X,,\2) = 



P{\2) 

1 + Ai 



Ai, A 2 G [0, 00 ) X [0, 00 ) . 



Thus, the proof of Theorem 1.3 will be finished, if we establish that D{\ 2 ) solves (3) 
on the positive half-line. Note that we have proved in Lemma 2.2 that if a solution 
of (3) exists then it is unique. Evidently, 0 < < 1 and 0 < (f{x) < 1 

for X > 0. Therefore, 



iJt(Ai,A2MAjy) ^ 1 

Vvi^-y) ~ \Zy(i -2/)’ 



for all y G (0, 1), A 2 > 0. 



Recalling that i?(A 2 ), being a Laplace transform, is monotone decreasing in the 
domain A 2 > 0 and applying the dominated convergence theorem to (13) we obtain 



where 



D{X2) = 1 - acf‘^X2 




-P(A2v^)</^(Aly) 

\/2/(l -2/) 



dy + ?'o,£(Ai, A 2 ), 



lim sup |ro,e(Ai,A 2 )| = 0. 
^-*+0 (Ai,A2)e® 



(15) 



In view of 0 < D{X 2 ) < 1 for all A 2 > 0 we may apply the dominated convergence 
theorem to (15) as £ | 0. As a result we get that D{X 2 ) indeed solves (3) on the 
positive half-line. 

Theorem 1.3 is proved. 



References 

[ 1 ] V. A. Topchii, V. A. Vatutin, E. B. Yarovaya, Catalytic branching random walk and 
queueing systems with random number of independent servers. Theory of Probability 
and Mathematical Statistics, v. 69, 158-172 (2003). 

[2] V. A. Topchii, V. A. Vatutin, Individuals at the origin in the critical catalytic branch- 
ing random walk. Discrete Mathematics & Theoretical Computer Science, v. 6, 325- 
332 (2003). 

[3] S. Albeverio, L. V. Bogachev, Branching random walk in a catalytic medium. I. Basic 
equations. Positivity, v. 4, 41-100 (2000). 

[4] S. Albeverio, L. V. Bogachev, E. B. Yarovaya, Asymptotics of branching symmetric 
random walk on the lattice with a single source. C.-R.-Acad.-Sci.-Paris-Ser.-I-Math., 
V. 326, 975-980 (1998). 

[5] L. V. Bogachev, E. B. Yarovaya, A limit theorem for a supercritical branching random 
walk on with a single source. Russian-Math.-Surveys, v. 53, 1086-1088 (1998). 

[6] L. V. Bogachev, E. B. Yarovaya, Moment analysis of a branching random walk on a 
lattice with a single source. Dokl.-Akad.-Nauk, v. 363, 439-442 (1998). (In Russian.) 

[7] T. E. Harris, (1963) The Theory of Branching Processes Springer- Verlag, Berlin; 
Prentice-Hall, Inc., Englewood Cliffs, N.J. 

[8] B. A. Sewastjanow, (1974) Verzweigungsprozesse. Akademie- Verlag, Berlin. 

[9] V. A. Vatutin, Critical Bellman- Harris branching processes starting with a large num- 
ber of particles. Mat.-Zametki, v. 40, 527-541 (1986). (In Russian.) 




Catalytic branching walk 



395 



Valentin Topchii 

Omsk Branch of Sobolev Institute of Mathematics 
Pevtcova Street, 13 
644099 Omsk 
Russia 

e-mail: topchij@iitam.omsk.net.ru 

Vladimir Vatutin 

Steklov Mathematical Institute 
Gubkin Street, 8 
119991 Moscow 
Russia 

e-mail: vatutin@mi.ras.ru 




Part VI 

Combinatorial Stochastic Processes 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



A Combinatorial Approach to Jumping 
Particles II: General Boundary Conditions 

Enrica Duchi and Gilles Schaeffer 



ABSTRACT: We consider a model of particles jumping on a row of cells, 
called in physics the one dimensional totally asymmetric exclusion process with 
open boundaries (TASEP). From the point of view of combinatorics a remarkable 
feature of this Markov chain is that Catalan numbers are involved in several entries 
of its stationary distribution. 

In a companion paper, we gave a combinatorial interpretation and a simple 
proof of these observations in the simplest case where the particles enter, jump 
and exit at the same rate. To do this we revealed a second row in which particles 
travel backward and defined on these two row configurations a Markov chain with 
uniform stationary distribution which is a covering of the TASEP. 

In this paper we show how to deal with general rates. The covering chain 
is still defined on two row configurations, but its stationary distribution is not 
uniform anymore. Instead it is described in terms of two natural combinatorial 
statistics. 

1. Jumping particles 

We shall consider a model of jumping particles on a row of n cells that was studied 
by and large since the early 90 ’s in physics, under the name one dimensional 
totally asymmetric exclusion process with boundaries, or TASEP for short. Roughly 
speaking, the TASEP consists of black particles entering a row of n cells from an 
infinite reservoir on the left hand side and randomly hopping to the right with the 
simple exclusion rule that each cell may contain only one particle. 

• 

••• 

• • 

• • • • 



• • 

Figure 1. An informal illustration of the TASEP. 

The TASEP is usually defined as a continuous-time Markov process on a finite 
set of configurations of particles on a line. We shall use an alternative definition 
as a finite state Markov chain — with discrete time — which is more convenient 
for our combinatorial purpose. One could insist on calling our chain the TASEC, 
with “C” for chain instead of “P” for process, but as we will argue later, there 
is no need for this distinction. Another cosmetic modification we allow ourselves 
consists in putting a white particle in each empty cell, to make explicit the left- 
right particle-hole symmetry of the system. 




400 



Enrica Duchi and Gilles Schaeffer 



1.1. The basic system, or TASEP 

A basic configuration is a row of n cells, each containing either one black particle 
or one white particle (see Figure 2). These cells are delimited by n + 1 walls: the 
left border (or wall 0), the ith separation wall for z = 1, . . . , n — 1, and the right 
border (or wall n). 






Figure 2. A basic configuration with n = 10 cells. 



The TASEP, which we shall sometimes refer to as the basic system^ is a 
Markov chain defined for any three parameters a, (3 and 7 in the interval 
[0, 1] on the set of basic configurations. From time t to t -|- 1, the chain evolves 
from the basic configuration r = Sa( 3 j{t) to a basic configuration r' = Sa( 3 ^{t + 1) 
as follows: 

• A wall i is chosen uniformly at random among the n + 1 walls, and then 
may become active with probability A(i), with A(z) = aforz = l,...,n-l, 
A(0) = (3 and A(n) = 7. 

• If the wall did not become active, then nothing happens: r' = r. 

• Otherwise the configuration may change near the active wall: 

a. If the active wall is not on the border (i G {1, . . . , n — 1}) and has a 
black particle on its left and a white one on its right, then these two 
particles swap: ©lo ^ o|«. 

b. If the active wall is the left border {i = 0) and the leftmost cell 
contains a white particle, then the white particle leaves the system 
and it is replaced by a black particle: |o ^ j#. 

c. If the active wall is the right border (i = n) and the rightmost cell 
contains a black particle, then the black particle leaves the system 
and it is replaced by a white particle: #| ^ o|. 

d. Otherwise the configuration is left unchanged. 

The four cases a, 6, c, d define an application d from the set of configurations with 
an active wall into the set of configurations, and, in terms of this application, the 
chain can be described as: let I{t) be a sequence of independent uniform random 
variables on {0, . . . , n}, and set 



f probability A(/(t)), 

\ SafSjit) otherwise. 



where \{i) = a for z G {1 , . . . , n - 1}, A(0) = /?, and A(n) = 7. 

As illustrated by Figure 3, during the evolution of the system, black par- 
ticles travel from left to right while white particles do the opposite. As already 
mentioned, one can equivalently think about white particles as empty cells. See 
Figure 10 for the entire system with n = 3. 




Figure 3. A possible evolution, with n = 4. The active wall 
triggering each transition is indicated. 




Jumping particles 



401 



1.2. A remarkable stationary distribution 

Among many results on the TASEP, Derrida et al [1, 3] proved the following nice 
property of the system in the case a = /? = 7 = L First, 

Prob(5iii(t) contains 0 black particles) — ^ — , (1) 

t-^oc Gn+1 

where C^+i = Cn+i) Catalan number. More generally, for all 

0 < fe < n, 

Prob(iSiii(t) contains k black particles) — > ^ ^2) 

t -^00 Cn+l 

The finite state Markov chain 5m is clearly ergodic so that the previous 
limits are in fact the probabilities of the same events in the unique stationary 
distribution of the chain [6]. More generally, Derrida et al provided expressions for 
the stationary probabilities of Sa/Sj- Since their original work a number of papers 
have appeared, providing alternative proofs and further results on correlations, 
time evolutions, etc. It should be moreover stressed that the model we presented is 
a special case among the many existing variants of asymmetric exclusion processes. 
See for instance the article [4] for recent advances and a bibliography. However, 
the remarkable apparition of Catalan numbers is not easily understood from the 
proofs in the physics literature. As far as we know, these proofs rely either on a 
matrix ansatz^ or on a Bethe ansatz, both being then proved by a recursion on n. 

In a previous paper [5], we proposed a combinatorial proof of Formulas (1) 
and (2) based on a combinatorial interpretation of the stationary distribution of 
5iii. The aim of the present paper is to give a combinatorial derivation of the 
general stationary distribution of SafS'y- 

1.3. The complete system 

The main ingredient we introduced in [5] to study the TASEP consisted in a new 
Markov chain Xm on a set of complete configurations, that satisfies two main 
requirements. On the one the stationary distribution of the basic chain 5m can 
be simply expressed in terms of that of the chain Am. On the other hand the 
stationary behavior of the chain Am is easy to understand. The complete con- 
figurations that we introduced for this purpose are made of two rows of n cells 
containing black and white particles. The first requirement was met by imposing 
that in the first row, the chain Am simulates the chain 5m, i.e. Am is a cov- 
ering chain of 5m. The second requirement was met by adequately choosing the 
complete configurations and the transition rules so that Am has clearly a uni- 
form stationary distribution. In this paper we shall proceed in an analogous way 
and construct a complete chain on that will allow us to describe the 

stationary distribution of the basic chain 5^/37. 

A complete configuration of is a pair of rows of n cells satisfying the 
following constraints: 

(i) The balance condition: The two rows contain together n black and n white 
particles. 

(a) The positivity condition: On the left hand side of any vertical wall there 
are no more white particles than black ones. 

An example of complete configuration is given in Figure 4. In view of Formulas (1) 
and (2), one first reason to introduce these complete configurations is that the 
cardinal of is and that, for all 0 < /c < n, the cardinal of the set 




402 



Enrica Duchi and Gilles Schaeffer 





0 


0 


• 


• 


• 


£ 


• 0 


0 


• 


• 


0 


0 


• 


0 


• 


• 0 


0 



Figure 4. A complete configuration with n — 10. 



|^|0|0|^|0|^|^|0|0|0|0| ^| 



|o |»|o|»|of|»|o|o|o|o|« 





't yM M iri V 
|• |^| 0 | 0 |^| 0 |^|^| 0 | 0 | 0 | 0 | 




''i f'i r rH V V V 

I^i0|^|0|•|^|0|0l0|0|^| 0| 



Figure 5. A white sweep and a black sweep. 



^ of complete configurations with k black and m = n — k white particles on 
the top row is (n-D* These formulas can be obtained in many ways, for 

instance using the cycle lemma (see [5]), or through one-to-one correspondences 
between complete configurations and bicolored Motzkin paths with n steps, or 
Dyck paths with 2n -h 2 steps [8, Chap. 6]. Yet another classical way to obtain 
them is using generating functions, as we shall do in Section 2. 

The Markov chain on will be defined in terms of an application T 

from the set x {0 , . . . , n} to the set This application, which we already used 
in [5], is derived in Section 3 as the first component of a fundamental bijection T 
and can be conveniently described as follows. Given a complete configuration to and 
an active wall i, the actions of T on the first row of lu do not depend on the second 
row, and mimic the application describing the evolution of the Markov chain 
Sa(3^ in the cases a, 6, c and d of the description of the basic TASEP. In particular 
in the top row, black particles travel from left to right and white particles from 
right to left. As opposed to that, in the bottom row, T moves black and white 
particles backward. In order to describe this, we first introduce the concept of 
sweep (see Figure 5): 

• A white sweep between walls ii and Z2 consists in all white particles of the 
bottom row and between walls ii and i2 simultaneously hopping to the 
right (some black particles thus being displaced to the left in order to fill 
the gaps). 

• A black sweep between walls ii and Z2 consists in all black particles of the 
bottom row and between walls and Z2 simultaneously hopping to the 
left (some white particles thus being displaced to the right in order to fill 
the gaps). 

Next, around the active wall i, we distinguish the following walls: if i 7^ 0, let 
ji < i the leftmost wall such that there are only white particles in the top row 
between walls ji and z — 1; if i 7^ rz, let j'2 > z be the rightmost wall such that 
there are only black particles in the bottom row between walls z -h 1 and ji2- 

With these definitions, we are in position to describe completely the appli- 
cation T. Given a configuration cu ^ and an active wall i G {0, . . . ,rz}, the 
cases a, 6 c and d of the basic chain describe the first row of T(u;, z), and they are 
complemented as follows to give the second row: 

a. Depending whether the particle on the bottom right of the zth wall in lu 
is black or white, a white sweep occurs between ji and z, or a black one 
between z -h 1 and j2 + 1 (see Figure 6). 




Jumping particles 



403 




b. The leftmost column of a; consists of a | ° |-column. These two particles 
exchange (in agreement with the rule for the top row), and a black sweep 
occurs between the left border and wall + 1- 

The rightmost column of u consists of a | * |-column. These two particles 
exchange (in agreement with the rule for the top row), and a white sweep 
occurs between wall j\ and the right border. 

As in the top row, nothing happens in the bottom row. 

The Markov chain is the Markov chain on the set of complete 

configurations that is defined from the application T exactly as the TASEP is 
described from &. the evolution rule from time t to t + 1 consists in choosing 
i = I{t) uniformly at random in {0 , ... , n} and setting 



c. 



d. 



Xcx(5'^{t 1 ) 






T {Xa( 3 'y{t)^i) with probability A(i), 
Xaj 3 j{t) otherwise, 



where A(i) = a for i G {1 , . . . ,n — 1), A(0) = l3, and A{n) = 7. 

By construction, the Markov chains SafS'y and X^js^ are related by 

tOp(Ao;/5'y) , 



where top(o;) denotes the top row of a complete configuration a;, and the = is 
intended as identity in law. An appealing interpretation from a combinatorial 
point of view is that we have revealed a circulation of the particles, that use the 
bottom row to travel backward and implement the infinite reservoirs. 



1.4. The stationary distribution of the complete system 

In order to express the stationary distribution of the chain Xa( 3 'j, we introduce two 
combinatorial statistics and use them to associate a weight q(uj) to each complete 
configuration. 

By definition, a complete configuration cj is a concatenation of four types 
of columns |*|, |*|, |^| and |^|, subject to the balance and positivity conditions. 
Observe that the concatenation of two complete configurations of and 0^ with 
i-\- j = n yields a complete configuration of Let us call prime a configuration 
that cannot be decomposed in this way. A complete configuration uj can be uniquely 
written as a concatenation uj = uji- • Um of prime configurations. These prime 
factors can be of three types: | * |-columns, | ° |-columns, and blocks of the form 
1 1 1^1 o I ^ complete configuration. The inner part uj' of a block u = | * o I 

is referred to as its inside. 

Now, given a complete configuration a;, let us assign labels to some of the 
black and white particles of its bottom row: a white particle is labeled x if it is 
not in a block, and a black particle is labeled y if it is not in the inside of a block 




404 



Enrica Duchi and Gilles Schaeffer 



o 


o 


o 


0 


• 


o 


o 


o 


• 


• 


0 


• 


0 


0 


• 


0 


0 


• 


• 


• 


• 


• 


# 


o 


• 


• 




• 


• 


o 


o 


o 


• 


• 



yOyyyy yy X 

Figure 7. A configuration lu with weight q{cj) = La- 

bels are indicated below particles. 



and if on its left hand side, all white particles belong to some block. Let us denote 
by and ny(iu) respectively the number of labels of type x and the number 

of labels of type y in the configuration cj. Then the weight of a configuration uo is 
defined as 

For instance, the weight of the configuration of Figure 7 is and more 

generally the weight is a monomial with total degree 2 n. 

Theorem 1 . 1 . The Markov chain Xafs^y is ergodic and has the following stationary 
distribution: 

Proh{Xa0^{t) ^ w) — *• where Y] g(w')> 

t—^oo Zjn 

where q{u) is the previously defined weight on complete configurations. 

In particular for a = /3 = ^ = 1, q{cu) = 1 for all configurations and we 
recover uniformity as in [5], 

Prob{Xin{t) = uj) — ^ = f-. 

t-^oo |J2U| 

The Markov chain Xap^y is clearly aperiodic, and the fact that it is irreducible 
follows from the irreducibility of 5m done in [5]. This granted, it is sufficient for 
the proof of Theorem 1.1 to show that the distribution induced by the weights q is 
stationary. We shall use an alternate description of T, which rely on the following 
result proved in Section 3. 

Theorem 1 . 2 . There exists a bijection T from x {0, . . . , n} onto itself such that 

• the application T is the first component ofT: for all u and i, T(o;, 2 ) = 

=^T{u!_,i) =uj', 

• the bijection T transports weights: for all =T{uj,i), 

AO')g(w') = \{i)qioj), 

where \{i) = a for i e { 1 , . . . , n — 1}, A(0) = (3 and A(n) = 7 . 

In order to see that the distribution induced by q is stationary, we assume 

that 

Froh{Xa( 3 j{t) =uj) ^ for all cj e 

and try to compute Prob(Aa/ 5 ^(t + 1 ) = a;'). 

Recall that I{t) denotes the wall chosen at time t, so that Xcxi 3 ^{t -h 1) = 
T{Xa^^{t)J{t)), and define J{t + 1) by f{Xap^{t)J{t)) = + 1), J(t + 1)) 



q{uj) = /?- 7 " ( -p 



Tly^UJ) 




Jumping particles 



405 



if I{t) became active, or by J(^ + 1) = I{t) otherwise. Then, in view of the first 
point in Theorem 1.2, 

n 

PTOb{Xa0^{t + 1) = u') ^ ^Prob(Xa/3-y(i + 1) = (j', J(i + 1) = j). 

j=0 



Now, by definition of the Markov chain Xa( 3 'y^ for all lj' and jf, 

PTOh(^Xaj3'y{t + 1) = J {t 1) = j) 

= \{i) • Prob(Xo;/ 3 ^(^) = a;, I{t) = i) 

+ (1 - \{j)) • Prob(x„/57(i) - m = i), 

where (a;, i) = T~^{uj\j). Since the random variable I{t) is uniform on {0, . . . , n}, 
we get 



Pvob{X a f 3 ^{t + 1) = ct;'. 



A«+i)=j) 

= A(i) «<"> 



Zn U 1 



+ (l-AO'))- 



g(^0 1 

72+1 



But according to the second point in Theorem 1.2, \{i)q{uj) = \{j)q{u') so that 
the terms involving A cancel. Finally 



Pro»(W + l)=-') = E+^il 

j=0 



Qi^') 



and this completes the proof that the distribution induced by q is stationary. 



1 . 5 . From the complete to the basic system. 

The relation Sa/s^y = top(Xc^^^) now allows to derive from Theorem 1.1 a combi- 
natorial interpretation for the basic system. 



Theorem 1 . 3 . Let top{u) denote the top row of a complete configuration u. Then 
for any initial configurations Sa(3j{0) and Xaf3^{0) with top{Xai3j{0)) = Soc(3'y{0), 
and any basic configuration r, 

Prob{Sa0'r{t) = r) = Prob(top(Xaj3j(t)) = r) — > T V' 

t— >-oo Zr 7 

I top(ct;)=r } 



In particular, in the case a = fi = ^ = 1, we recover 

t — VOO 

which is yields 

Prob{Saf3j{t) contains k black particles) 



Kr 



n+1 V k ) \ m ) 



t—^oo 



a 



n+l 



As discussed in Section 4 this interpretation sheds a new light on some recent 
results of Derrida et al connecting the TASEP to Brownian excursions [2]. 




406 



Enrica Duchi and Gilles Schaeffer 



1.6. Continuous-time descriptions of the TASEP 

In the physics literature, the TASEP is usually described in the following terms. 
The time is continuous, and one consider walls where a move can take place: at 
any time, wall i has probability \(i)dt to trigger a move oo — > 7?(a;, i) during time 
interval dt (the rate A(z) is defined as previously). 

Following the probabilistic literature [7], one can give an formulation which is 
equivalent to the previous one, but already closer to ours. In this description, each 
wall waits for an independent exponential random time with rate 1 before waking 
up (in other terms, the probability that wall i will still be sleeping in t seconds is 
e~^). When wall i wakes up, it has probability X{i) to become active. If this is the 
case, then the transition lj ^(cj, i) is applied to the current configuration cj. In 
any case the wall falls again asleep, restarting its clock again. 

This continuous-time TASEP is now easily coupled to the Markov chain Sap^y- 
Let the time steps of Saidy correspond to the succession of moments at which a 
wall wakes up. Then in both versions, the index of next wall to wake up is at any 
time an uniform random variable on {0 , . . . ,n}, and when a wall wakes up the 
transition probabilities are identical. This implies that the stationary distribution 
of the continuous-time TASEP and its Markov chain replica are identical. 

1.7. Outline of the rest of the paper. 

In Section 2 an approach to compute explicit quantities from the combinatorial 
interpretation is briefly exposed. Theorem 1.2 is proved in Section 3. Finally some 
concluding remarks are gathered in Section 4. 



2. Enumeration 



Let us introduce the weighted generating function of complete configurations with 
respect to their length: 

^ ^ so that f , f ) = E«>o 

n>0o;Gr20 



The decomposition of a configuration at its first block yields 

Z{t] u,v) = 1 tvZ{t\ u, v) tuZ[t\ 1, t’) + t^vZ{t] 1, l)Z{t\ u, v). 
Solving this equation yields 



Z{t\ u, v) 



2 — u — v-\-uv — 2tuv — {u-\- V — uv)^/l — At 
2(1 — u -h tu^){l -v-\- tv‘^) 



Extracting coefficients in this expression allow to recover for instance a formula 
for Zn. One could also have taken into account the number of particles in the top 
row in the equation. We do not pursue on this line since Zn was obtained by other 
ways and largely studied as a function of a, /J, 7 in the physical literature. 



3. The bijection T 

The aim of this section is to prove Theorem 1.2, thus giving an alternate description 
of the transformation T and a case by case analysis of its action on the weight q. 
We need the following properties, the verification of which is left to the reader. 

Property 3.1. Let lu be a complete configuration belonging to have the 

following structure properties: 




Jumping particles 



407 



1. In a local configuration |*|°| the black particle in the bottom row never 
contributes a label y. 

2. The white particle in the bottom row of a \ ° {-column never has a label x. 
and movement properties: 

i. The deletion/insertion of a \ l\-column does not change the labels of other 
particles, 

ii. The deletion/insertion of a pair * | ^ taking the form | * | ° | ^ | ° | does not 
change the labels of other particles. 

Prom now on in this section, (o;,z) denotes an element of the current class, 
and (a;', j) its image by T. In the pairs (o;,i) and (o;',j), i and j refer to walls 
of the configurations u and a;', and i is called the active wall of u. Following the 
notations of Section 1, when i / 0, we also consider ji < i the smallest integer 
such that in the top row of u all cells between walls ji and i — 1 contain white 
particles. Symmetrically, when i ^ n, we consider i the largest integer such 
that in the top row of w all cells between walls i -h 1 and j 2 contain black particles. 

To define the bijection T and prove Theorem 1.2 we shall partition the set 
Qn X {0, ...,n} into classes Aa', Aa'^, Aa",_Aa'j, At^, A^ Ac^, Ad, and de- 
scribe, for each class Az, its image Bz = f{Az) under the action of f and the 
corresponding variation of the weight q: 

The active wall of a; separates in the top row a black particle P and a white particle Q. 
Then in the top row the particles P and Q swap. In the bottom row, the sweep that 
occurs depends on the type of the particle R that is below Q in uj (see Figure 8): 

Aa' The particle R is black and the wall j\ is different from 0. Then j = ji 
and, in the bottom row, a white sweep occurs between walls j and i. The 
new configuration uj' belongs to Indeed uj' can also be described as 
obtained from uj by moving a | ° | -column from the right of the ith wall 
to the right of the Jith. But moving a |°|-column has no effect on the 
positivity constraint. 

Now we want to compare q{uj) and q{uj'). According to Property 3.1.1 
the particle R does not contribute a label y neither in uj nor in uj'. More- 
over, according to Property 3.1.i, the displacement of the | ° |-column does 
not affect labels of other particles. Hence q{uj) = q{uj')^ in agreement with 

Mi) = \{j). 

The image Ba' of the class Aa' consists of pairs {(jo' , j), j > 0, with a 
1 1 |-column on the right hand side of the jth wall of uj' and such that the 
sequence of white particles on the right hand side of the jth wall in the 
top row is followed by a black particle. 

Aa' The particle R is black and the wall ji = 0. Then j = 0 and a white sweep 
occurs between walls 0 and i. The new configuration uj' still belongs to 
since it is again obtained from uj by moving a | ° [-column from the right 
of the zth wall to the right of the wall 0. 

As opposed to the previous case. Property 3.1.1 applies only to uj: 
in cj', the displaced | ° [-column is the leftmost one, so that its black par- 
ticle contributes a supplementary y label. Therefore q{uj') = q{uj)^, in 
agreement with A(i) == a, A(0) = f3. 

The image Ba'^ consists of pairs (o;',0) such that the top row starts 
on the left by a non-empty sequence of white particles followed by a black 
one. 




408 



Enrica Duchi and Gilles Schaeffer 







Figure 8. Jump moves in the case •|o o|«. 



Aa" The particle R is white and the wall is different from n. Then j = j2 and, 
in the bottom row, a black sweep occurs between walls i + 1 and j + 1. 
The new configuration lu' belongs to fin- Indeed a;' can be described as 
obtained from uj by moving a * | ^-diagonal from the ith wall to the j2th 
wall: this movement has no effect on the positivity constraint. 

Prom Property 3.1.2 we see that the particle R does not contribute 
a label y neither in uj nor in a;', and from Property 3.1.ii the displacement 
of the * I ^-diagonal does not affect labels of other particles. Hence q(uj) = 
g(a;'), in agreement with A(z) = A(j). 

The image Ba" of the class Aa" consists of pairs with a |°|- 

column on the right hand side of the jth wall and such that the sequence 
of black particles on the left hand side of the jth. wall in the top row is 
followed by a white particle. 

Aa" The particle R is white and the wall j2 is equal to n. Then j = n and, in 
the bottom row, a black sweep occurs between walls i + l and n. The new 
configuration uj' still belongs to since it is obtained from u by removing 
a * I ^-diagonal around the ith wall and inserting a | * | -column to the left 
of the n-th wall. 

As opposed to the previous case. Property 3.1.2 applies only to lu: 
the inserted |*|-column is the rightmost one, so that its white particle 
contributes an x label. Therefore q{ou') = in agreement with A(i) = 

a and A(n) = 7. 




Jumping particles 



409 



0 ^2 0 " j\ n ^ n 




Figure 9. Active left border and active right border with respec- 
tively a black and a white particle in the top row. 



The image Ba>^ of the class consists of pairs such that 

there is a sequence of black particles on the left hand side of the nth wall 
in the top row, followed by a white particle. 

The active wall of uj is the left border with a white particle Q on its right in the top 
row. Again, the cell under Q must contain a black particle R (see Figure 9). First 
the two particle Q and R exchange to form a | * |-column. Then we have two cases: 

Ab the wall is not n. Then j = j2 and, in the bottom row, a black sweep 
occurs between walls 1 and j + 1. The configuration u' belongs to Qn- 
Indeed no black particle moves to the right. 

Equivalently, the new configuration uj' is obtained by inserting a * | ^ - 
diagonal at the wall j»2 and deleting the first | ° |-column. According to 
Properties S.l.i and S.l.ii only the labels of the displaced particles are 
affected. Since the deleted | ^ |-column is the leftmost, it contributes a y 
label in a;. As opposed to that. Property 3.1.2 forbids the * | ^-diagonal to 
contribute a x label. Therefore q{oo') = in agreement with A(0) = j3 

and \{j) = a. 

The image consist of pairs with a |°|-column on the right 

of the jth wall of uj' and such that the sequence of black particles on left 
of the jth wall in the top row ends at the border. 

Ab^ the wall j 2 is equal to n. Then j = n and, in the bottom row, a black 
sweep occurs between walls 1 and n. Finally, q{uj') = q{uj)^^ = q{uj)^, in 

agreement with A(0) = (3 and \{n) — 7. 

The image Bb^ is reduced to the configuration with all black particles 
in the top row. 

The active wall of u is the right border with a black particle Q on its left in the top 
row. The cell under Q must contain a white particle R (see Figure 9): First the 
particles Q and R exchange to form a | ° |-column. There are then two cases: 

Ac the wall j\ is different from 0. Then j — j\ and in the bottom row, a white 
sweep occurs between walls j and n — 1. The configuration uj' belongs to 
fin since the transformation amounts to moving and flipping a | * |-column. 

Equivalently, uj' is obtained by inserting a |°|-column at the left 
of the wall n and deleting the |°|-column on the right of j\. According 
to Property S.l.i only the labels of displaced particles can be affected. 
Since the deleted |*|-column is the rightmost column, its white particle 
contributes an x label in cj. As opposed to that. Property 3.1.1 forbids 




410 Enrica Duchi and Gilles Schaeffer 

the |°|-column to contribute a label in uo' . Therefore q{oo') = q{u)^, in 
agreement with A(n) =7 and \{j) = a. 

The image Be consist of pairs (a;', j) with a |-column on the right 
hand side of the jth wall of a;' and such that the sequence of white particles 
on the right hand side of the jth wall in the top row ends at the right 
border. 

Ac^ the wall ji is equal to 0. Then j — 0 and in the bottom row, a white sweep 
occurs between walls 0 and n — 1. Finally, q{tu') = 

agreement with A(n) = 7 and A(0) = /?. 

The image is reduced to the configuration with all white particles 
in the top row. 

Ad' This class contains all the remaining cases. On these pairs the application T has 
no effect, that is, for (cj, i) G Ad, i) = (a;, i). In particular the weights are left 
unchanged. 

The observation that image classes {Ba ' , Ba >^ , Ba" , Ba'j , -B 5 , B ^^ , B^ B ^^ , Bd] 
form a partition of x {0, . . . , n} completes the proof of Theorem 1.2. 



4. Conclusions and links to Brownian excursions 

The starting point of the paper [5] was a “combinatorial ansatz” : the stationary 
distribution of the TASEP can be expressed in terms of Catalan numbers hence 
should have a nice combinatorial interpretation. As we have seen in the present 
paper, our approach is natural enough to extend to more general TASEP. 

We do not claim that our combinatorial interpretation is of any physical rele- 
vance. However, as already pointed out in [5] , apart from explaining the occurrence 
of “magical” Catalan numbers in the problem, it sheds a new light on the recent 
results of Derrida et al [2] connecting the TASEP with Brownian excursion. More 
precisely, using explicit calculations, Derrida et al show that when a = f3 = ^ = 1, 
the density of black particles in configurations of the TASEP can be expressed in 
terms of a pair {et,bt) of independent processes, a Brownian excursion et and 
a Brownian motion ht. In our interpretation these two quantities appear at the 
discrete level, associated to each complete configuration cu of 

• The role of the Brownian excursion is played for u; by the halved differences 
e(i) = ^{B{i) — W{i)) between the number of black and white particles 
sitting on the left of the ith wall, for z = 0, . . . , n. By definition of complete 
configurations, (e(i))i=: 0 ,...,n is a discrete excursion, that is, e( 0 ) = e{n) — 
0 , e{i) > 0 and le(i) — e(z — 1 )| G { 0 , 1 }, for z = 0 , . . . ,n. 

• The role of the Brownian motion is played for a; by the differences b{i) = 

Btop{i) — between the number of black particles sitting in the top 

and in the bottom row, on the left of the zth wall, for z = 0, . . . , rz. This 
quantity (b(z))i=o,...,n is a discrete walk, with \b{i) — b{i — 1)| G {0, 1} for 
z = 0 , . . . ,n. 

Since e(z) 4 - b{i) = 2Btop{i) — i, these quantities allow to describe the cumulated 
number of black particles in the top row of a complete configuration. Accordingly, 
the density in a given segment (z, j) is 

Btopjj) - Btopji) ^1 e(j) - e(i) b(j) - b{i) 

j-i 2 2{j-i) 2{j-i) 

This is a discrete version of the quantity considered by Derrida et al in [2]. 




Jumping particles 



411 



Now the two walks e{i) and b{i) are correlated since one is stationary when 
the other is not, and vice versa: \e{i) — e(i — 1)| + \b{i) ~ b{i — 1)\ = 1. Given 
u;, let le = {o;i < ... < ap} be the set of indices of |*|- and |°|-columns, and 
h = {A < ••• < /3g} the set of indices of |*|- and |^|-columns [p -\~ q = n). 
Then the walk e'(z) = e(a^) — is the excursion obtained from e by ignoring 

stationary steps, and the walk b'{i) = — b{!3i-.\) is obtained from b in the 

same way. Conversely given a simple excursion e' of length p, a simple walk V of 
length q and a subset /g of {1, . . . ,p + g} of cardinal p, two correlated walks e 
and 6, and thus a complete configuration a; can be uniquely reconstructed. The 
consequence of this discussion is that the uniform distribution on (which is 
stationary for a = /? = 7 = 1) corresponds to the uniform distribution of triples 
(7e,e',6') where given /g, e' and b' are independent. 

A direct computation shows that in the large n limit, with probability ex- 
ponentially close to 1, a random configuration lu is described by a pair (e',6') of 
walks of roughly equal lengths n/2 + In particular up to multiplicative 

constants the normalized pairs (^77^, and 7^^) both converge 

to the same pair (et^bt) of independent processes, with et a standard Brownian 
excursion and bt a standard Brownian walk. 

We thus obtained a combinatorial interpretation of the apparition of the pair 
{et,bt) in the TASEP ata — /3 = 7 = 1. How do these considerations extend 
to other TASEP? The case a much smaller than (3 and 7 essentially reduces to 
a = /J = 7 = lona system of length n — 2 (with border cells acting as reservoirs). 
The case (3 and 7 smaller than a appears more interesting to consider: at a rough 
level, the weights force e{t) to spend more time at the value exactly zero and favors 
negative value of b{t). The derivation of the associated continuum quantities in this 
case could be of some interest. 

Another challenge raised by our approach is to give an explicit construction 
of a continuum TASEP by taking the limit of the Markov chain viewed as a 

Markov chain on pairs of walks. An appealing way to give a geometric meaning to 
the transitions in the continuum limit could be to use a representation in terms of 
parallelogram polyominoes, where the process e{t) (or St in the continuum limit) 
describes the width of the polymonino and b{t) (or bt in the continuum limit) 
describes its vertical displacement. 




412 



Enrica Duchi and Gilles Schaeffer 




Figure 10. The basic configurations for n = 3 and transitions 
between them. The start point of each arrow indicates the wall 
triggering the transition. The numbers are the stationary proba- 
bilities. 




Figure 11. The 14 complete configurations for n = 3 and tran- 
sitions between them. The start point of each arrow indicates the 
wall triggering the transition (loop transitions are not indicated). 
Stationary probabilities are uniform (equal to 1/14) since each 
configuration has equal in and out degrees. Ignoring bottom rows 
reduces this Markov chain to the chain of Figure 10. 



References 

[1] B. Derrida, E. Domany, and D. Mukamel. An exact solution of a one dimensional 
asymmetric exclusion model with open boundaries. J. Stat. Phys., 69:667—687, 1992. 

[2] B. Derrida, C. Enaud, and J. L. Lebowitz. The asymmetric exclusion process and 
Brownian excursions. Available electronically as arXiv:cond-mat/0306078. 

[3] B. Derrida, M.R. Evans, V. Hakim, and V. Pasquier. Exact solution of a one- 
dimensional asymmetric exclusion model using a matrix formulation. J. Phys. A: 
Math., 26:1493-, 1993. 




Jumping particles 



413 



[4] B. Derrida, J. L. Lebowitz, and E. R. Speer. Exact large deviation functional of a 
stationary open driven diffusive system: the asymmetric exclusion process. Available 
electronically as arXiv: cond-mat/0205353. 

[5] E. Duchi and G. Schaeffer. A combinatorial approach to jumping particles I: maximal 
flow regime. In proceedings of FPSAC’04, 2004. 

f6l O. Haggstrom. Finite Markov Chains and Algorithmic Applications. Cambridge Uni- 
versity Press, 2002. 

[7] T. M. Liggett. Interacting Particle Systems. Springer, New York, 1985. 

[8] R. Stanley. Enumerative Combinatorics, volume II. Cambridge University Press, 
1999. 

Enrica Duchi 

CAMS, EHESS, 87, bd Raspail, 75006 Paris, Prance 

Enrica. Duchi@ehess . fr 

Gilles Schaeffer 

LIX, CNRS - Ecole Polytechnique, 91128 Palaiseau, France 

Gilles.Schaeffer@lix.polytechnique.fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Stochastic Deformations of Sample Paths of 
Random Walks and Exclusion Models 

Guy Fayolle and Cyril Furtlehner 



ABSTRACT: This study in centered on models accounting for stochastic de- 
formations of sample paths of random walks, embedded either in 1? or in 1? . These 
models are immersed in multi-type particle systems with exclusion. Starting from 
examples, we give necessary and sufficient conditions for the underlying Markov 
processes to be reversible, in which case their invariant measure has a Gibbs form. 
Letting the size of the sample path increase, we 6nd the convenient scalings bring- 
ing to light phase transition phenomena. Stable and metastable configurations 
are bound to time-periods of limiting deterministic trajectories which are solution 
of nonlinear differential systems: in the example of the ABC model, a system of 
Lotka-Volterra class is obtained, and the periods involve elliptic, hyper-elliptic or 
more general functions. Lastly, we discuss briefly the contour of a general approach 
allowing to tackle the transient regime via differential equations of Burgers’ type. 



1. Introduction 

We are interested in models describing evolution of sample paths of random walks, 
when they are submitted to random local deformations involving possibly several 
links. Roughly speaking, given a finite sample path, say of size N, forming a not 
necessarily closed curve, the problem will be to characterize the evolution of an 
associated family {Y^,z = 1, . . . , A^} of Markov processes in the thermodynamic 
limit as N oo. This requires to guess and to find the interesting scalings. 

In a previous study [7], we considered random walks on a square lattice, deforma- 
tions involved pairs of links and occurred at the epochs of Poisson jump processes 
in continuous time (see section 2 for a more exact definition). The analysis was 
carried out by means of an explicit mapping, which led to view the system as a 
coupling of two exclusion processes. 

Starting from a number of observations, we intend to hint in this paper that 
the model in [7] can indeed be cast into a broader class, the ultimate goal being 
to propose methods of wide applicability concerning the following questions: 

• conditions ensuring Gibbs states and explicit forms of the corresponding 
invariant measures; 

• steady-state equations in the thermodynamic limit as iV ^ oo, and their 
solutions in the case of Gibbs states, but also in situations involving per- 
manent currents; 

• hydrodynamic and transient equations, when N is sufficiently large, yield- 
ing thus a complete picture of the evolution. 

Generalizations of the model in can follow two natural trends. First, in 
modifying the construction of the random walk. Indeed, in the square lattice, we 
dealt with a 4-letter alphabet. Considering instead a finite alphabet of I letters is 




416 



Guy Fayolle and Cyril Furtlehner 



then tantamount to constructing random walks with oriented links, whose affixes 
are multiples of A; = 0, — 1. The case I = 2 corresponds to the simple 
exclusion process in Z, and I = 3 yields the so-called ABC model 

Another possible extension is to relax the constraint that the walk lives in I? 
and to define a stochastic deformation process in higher dimension. In the sequel, 
we shall restrict ourselves to some paradigms in Z^ and Z^. 

In section 2, we define a class of two-dimensional models, together with re- 
lated patterns in Z^, in terms of exclusion particle systems. Section 3 is devoted to 
stochastic reversibility of the Markov processes of interest and to the Gibbs form 
of their invariant measure. In section 4, the non-symmetric classical ABC model 
is solved (fundamental scaling, phase transitions, classification of stable configura- 
tions) through the analysis of a Lotka-Volterra differential system. The concluding 
section 5 gives a brief overview of ongoing research about large scale dynamics, 
nonequilibrium and transient regimes. 



2. Model descriptions via exclusion particle systems 

Our main objective in this section is to show how the evolution of the sample 
paths of random walks can be fruitfully described by means of particle exclusion 
processes. 

Beforehand, to avoid repetition and clumsy notation, let us emphasize that 
we shall only deal with jump Markov processes in continuous time. So implicitly 
the word transition rate will always refer to some underlying generator. Also, N 
will always stand for the size of the sample path, or equivalently the number of its 
links. 

2.1. Preliminaries 

In 1} , the simple exclusion model coincides with the well known KPZ system 
(see e.g. [9]), which represents a fluctuating and eventually growing interface. This 
system is coded by a sequence of binary variables {r^}, j = 1, . . . , AT, depending on 
whether a particle is present or not, with asymmetric jump rates. This system has 
been extensively studied. In particular, the invariant measure has been obtained 
in a closed matrix form solution, for fairly arbitrary parameters and boundary 
conditions [3]. Large scale dynamics has also been analyzed [14], showing Burgers’ 
equations [1]. 

2.2. 2-dimensional models 

1 ) The triangular lattice and the ABC model Here the evolution of the random 
walk is restricted to the triangular lattice. Each link (or step) of the walk is either 
1, e227r/3 Qj. g4z7r/3^ quite naturally will be said to be of type A, B and C, 
respectively. This corresponds to the so-called ABC models since there is a coding 
by a 3-letter alphabet. The set of transitions (or reactions) is given by 

AB h BA, BC h CB, CA h AC, (1) 

p+ q+ 7*+ 

where the introduced rates are fixed, but not necessarily equal. Also we impose 
periodic boundary conditions on the sample paths. This model was first introduced 
in [4] in the context of particles with exclusion, and a Gibbs form corresponding 
to reversibility has been found in [5] in some cases. 




Stochastic deformations of sample paths 



417 



2) The square lattice and coupled exclusion model This model was introduced 
in [7] to analyze stochastic deformations of a walk in the square lattice, and it will 
be referred to from now on as the {raT^} model. Assuming links counterclockwise 
oriented, we the following transitions can take place. 





^ab 












^cd 






^da 




AB 




BA, 


BC 




CB, 


CD 




DC, 


DA 




AD, 




Xba 






Xcb 






Xdc 






Xad 


















^ca 






jdb 




AC 




BD, 


BD 




CA, 


CA 




DB, 


DB 




AC. 




^bd 






Sea 






Sdb 






sac 





We studied a rotation invariant version of this model when 

'A+ = = A'^'^ = A<^“, 

A~ = A^^ = A^^ = A^^ 

= yca _ ^db ^ 

= s^'>. 

Define the mapping {A, B, C, D) — > (t“, r*’) £ {0, 1}^, such that 



'^^(0,0), 

B (1,0), 

C-.(l,l), 

Id ^(0,1). 

Then the dynamics can be formulated in terms of coupled exclusion processes. The 
evolution of the sample path is represented by a Markov process with state space 
the 2N binary random variables {r“} and {rj’}, y = 1, . . . , iV, taking value 1 if a 
particle is present and 0 otherwise. The jump rates to the right (+) or to the left 
(— ) are given by 



(i) = + ri>riVi7*, 

A? (i) = Tf f“+iA± + rf rf+iA^ + f^r^+i'y^ + 



( 2 ) 



Notably, one sees the jump rates of a given sequence are locally conditionally 
defined by the complementary sequence. 



3) An extended stochastic clock model. We propose an extension of the pre- 
ceding model for an arbitrary two-dimensional regular graph. To this end, con- 
sider a random walk composed of oriented links, the affixes of which take values 
LJk = exp(2i/o7r/n), k = 1, . . . ,n, the n-th roots of unity. There are two diflPerent 
situations. 



(a) n = 2p 1 is odd. Then the walk cannot have a fold of two successive 
links, so that local displacements of edges can only be performed by exchanging 
two successive links. Let X = {X\ . . . , X^}, denote the particle types viewed as 
letters of an alphabet, and let {A^^} be the transition rates. The set of reactions 
is defined by 

^kl 

xkxi ^ k e [1, 2p +l],k^ 1. 

Xik 

These rules provide an extension of the ABC model, which we shall discuss in 
detail in section 3. 




418 



Guy Fayolle and Cyril Furtlehner 



(b) When n = 2p is even, the grammar is altered when two successive links 
fold, so that this elementary transition amounts merely to a rotation of the fold of 
angle (instead of a tt rotation which would occur when exchanging the two 
links). The situation is thus slightly more complicated and the set of reactions is 
now given by 



y ^kl 

Xik 

Sk + l 



k+p, 

..,n, 



( 3 ) 



where k + p is taken modulo n. It is worth noting that 7^ (resp. 8^) concern folds 
rotating in the counterclockwise direction (resp. clockwise) and that the number of 
letters of each type is no longer conserved. In other words, odd models give rise to 
pure diffusions with eventual drifts, when even models are truly reaction-diffusion 
models. 



2.3. 3-dimensional generalizations of the coupled exclusion model on a diamond 
lattice 

2.3.1. Elementary deformations Actually, the diamond lattice formulation 
of stochastic deformations in 1? provides several straightforward generalization 
of the 2-dimensional {raT^,} model. Indeed, between two nodes there are 8 = 2^ 
possible links (the jumps of the sample path). Let (tq, r^, Tc) be the vector of binary 
components corresponding to a displacement in each direction, where a, b and c 
denote here the three particle families (letters). 

As in the 2-dimensional model, elementary deformations consist in exchang- 
ing between neighbouring sites the value of one of the binary components. By 
construction this model gives a kind of geometric decoupling between the three 
types of particles. In fact all possible existing stochastic coupling result solely from 
the conditional transition rates Aj(rb,Tc), A^(rc,Ta), and A^(ra,T5), according to 
the various possible models. In order to obtain some non-trivial dynamical effect, 
we couple the three systems of particles, keeping in mind the possibility of getting 
back the 2-dimensional {TaTh) model under certain conditions. This is the subject 
of the next two paragraphs. 

2.3.2. A LINEAR COUPLED EXCLUSION MODEL Here we propose a coupling which 
is linear with respect to the complementary fields, via the following intensities: 

I = A±2(rf-Tf)//6, 

[A±(z) = A±2(r“-rf)/ic- 

Suppose for a while = 0 and pa = Hh = P- Then the sequence {taUc} 
remains disordered, which means the marginal law of rf is As for the subsystem 
{Ta,Tb}, up to random contributions (rf — ^)/i, the transition rates Aj and A^ 
correspond to a particular definition of the {rau} model. Hence, in the limit, we 
obtain a kind of disordered {raTb} model. 




Stochastic deformations of sample paths 



419 



2.3.3. A NON-LINEAR COUPLED EXCLUSION MODEL It is also expected to recover 
the TaU model when one of the components is completely frozen in an ordered 
phase, taking for example rf = 0 for i = 1, . . . , iV. This situation is fulfilled by 
choosing the following non-linear couplings 

W = ^ ± i.Ti - 

< AfW = A±(rf-ff)(rf-ffK 
^A±(i) = A±(rf-ff)(rf-f!>K. 



2.4. Boundary conditions 

Although they live in I? or Z^, let us emphasize that all the objects considered 
throughout this study are curves, hence one-dimensional dynamical systems. 

Hence, for any given sample path of size A, there are two links referred to as 
site 1 and A, at which boundary conditions have to be specified. We shall consider 
only periodic boundary conditions: this means essentially the system is invariant 
under circular permutation of the sites. Consequently, certain geometric quantities 
locally conserved will remain also globally conserved. For example, the distance 
between the two extremities (not necessarily distinct) of a curve remains constant, 
so that closed curves will stay closed for ever. 



3. Reversibility, Gibbs measures and equilibrium states 

In the sequel, our goal will be to find exact scalings permitting to derive phase 
transition conditions, for deformation processes of the class defined in section 2, 
as A oo. In fact, for the sake of shortness, we shall restrict ourselves to case 
studies where the sample paths have a Gibbs invariant measure, but this is by no 
means a necessary constraint, as commented in section 5. 



Consider a system having a state space of the form W = , where Aand 

S are finite sets. The following notion, adapted from [12], will be sufiicient for our 
purpose. 



Definition 3.1. For any finite set R E S, let {Vr] be a collection of real numbers. 
A probability measure ( on is said to be a Gibbs state or a Gibbs measure 
relative to the potential {Vr} if , for all w = {wi, . . . ,w\s\) € W, 



C{w) = - exp 






Wi 



R 



ieR 



where Z is a normalizing constant. 

Conditions will now be established, which are either of a geometrical nature 
or bear directly on the transition rates, for the processes of interest to be reversible: 
detailed balance equations hold and they indeed suffice to equilibrate all possible 
cycles in the state space (see e.g. [10]). In that case a potential does exist and the 
invariant measure can be explicitly expressed as a Gibbs measure. 




420 



Guy Fayolle and Cyril Furtlehner 



3.1. Odd alphabet 

For 2-dimensional walks, when the size n of the alphabet is odd, as already noticed 
in section 2.2, there is a dual point of view saying that a n-species particle system 
moves in a one-dimensional lattice: there is exactly one particle per site and the 
transition rates correspond to exchanges of a particle k with a particle I between 
adjacent sites. In a very different context, this model was proposed in [5], from 
which we extract some results pertaining to our topic. 

Up to a slight abuse in the notation, we let G {0, 1} denote the binary 
random variable representing the occupation of site z by a letter of type k. The state 

of the system is represented by the array X = {X^,i = 1, . . . , iV; A; = 1, . . . , n} of 
size N X n. Then the invariant measure of the associated Markov process is given 
by 

P(X) = iexp[-J{(X)], (4) 



where 






i<j k,l 



Z is the normalizing constant, and 






provided that the following condition holds 

= 0 . 



Indeed, a typical balance equation reads 

p[...,x> = i.x;^, = i,...l ^ , _ 

= = A" “ '■ 

and relation (6) proceeds directly from enforcing the above measure to be invariant 
by translation. 

3.2. Even alphabet 

def 

When the cardinal of the alphabet is even, say n = 2p, the situation is rendered 
a bit more involved due possible rotations of consecutive folded links. There is no 
longer conservation law for the number of letters in each class, and one should 
instead introduce the quantities 

A'‘ = N'^+p - N'^ , k = l,...,p-l, 

which represent the differences between populations of links with opposite di- 
rections. Moreover, as a rule, some non-trivial cycles in the state-space are not 
balanced (see figure 1), unless transition rates satisfy additional conditions. This 
gives rise to the next theorem. 

Theorem 3.2. Assume n = 2p and periodic boundary conditions. Then the system 
is reversible if and only if the following conditions are imposed on the rates and 
on the particles numbers: 



(i) n 



_ = 1,V/ = l,...,n 




Stochastic deformations of sample paths 



421 




Figure 1. Elementary cycles: fold (a), 3-link motion (b), square 
loop (c). 



^kl I 

^kl 

(Hi) XI ^ = 0, k=l,...,n. 

l^k+p 

The result relies on the next lemma. 



Lemma 3.3. In the case of periodic boundary conditions, if the invariant measure 
has a Gibbs form given by (4) and (5), then the following relationships must hold: 



(iv) 



^kl _ ^Ik _ 



Xik’ 



^/e+l,fc+P+l _^k,k-\-p 



fc = 1, . . . ,n, Z 7 ^ fc -hp; 

/y^ 

= k = l,...,n; 



(v) there exists a constant a 6 M such that 

^ki ^ ^k+p,i _ jk ^ ^i,k+p = a,'ik,l = l,...,n-, 

P-1 

{vi) — Q!^^)A^ = 0, k = \,...,n. 

i=i 



Proof: We only present the main steps. Condition (iv) in the lemma comes from 
a balance equation of type (7). The case Z = fc -h p corresponds to adjacent folded 
links, and, after setting 



U{k,i) =*' X] XI 

I j>i+l 



- -a''+P’')Xj, 



^fc-j-p-|-l, I 





422 



Guy Fayolle and Cyril Furtlehner 



V{k,i) = 

I j<i 

equation (7) has to be replaced by 

P[. . . , Xf = 1, xf = 1, . . .] 



= exp [a''+P+^’ '=+1 - a^’ *=+p - U{k, i) - V{k, i)] , 
which leads to impose condition (v). 

To take into account the invariance by translation, let a denote a circular 
permutation a among the indices, such that a{i) = 1 + i mod {N). Consider 

^<r{x)= Y1 

l>k,l>N i<j 



the resulting energy obtained after applying permutation a. Then 

AT N 

?f^(X) = J{(X) + ^Xf - a^'^)N‘, 

k=l 1=1 



where is the number of links 1. Since / = 1, . . . ,p is conserved (but 

is not), the rule (v) leads to the translational invariance in the form of (vi). 

As to reversibility, one can check that (i) and (ii) are necessary to equilibrate 
the cycles depicted in figure 1. Moreover, the cycle condition imposed by a circular 
permutation is exactly given by (Hi). Indeed, this cycle is performed by transport- 
ing one particle through the system from site 1 to site N: during this operation, 
a tagged particle will encounter the particles corresponding to all other 



A 



kl 



Xik 



species I ^ k and the resulting transition weight is then given by JJ 

l^k,k~{-p ^ 

which in turn, by using (m), amounts to condition {in) after taking the logarithm. 
These three conditions are in fact sufficient to determine the parameters {a^^} in 
order to solve {iv)^ (v), (vi), thus ensuring reversibility. 



4. Steady state of the ABC system in the thermodynamic limit 

As an illustration, we will present a detailed analysis of the thermodynamic equi- 
librium situation in the case of the ABC model. When there are three particle 
species, the form TC of (5) comes to 

J{({X}) = a^^AiBj + a^^BiCj + a^^CiAj , (8) 

i<j 

where the constants take the values 

a“*’ = log— , a*’" = log— , a‘=“ = log— , 

p~ q r~ 

and the constraints (6) now become 

Nc 



(9) 




Stochastic deformations of sample paths 



423 



4.1. Scaling and Lotka-Volterra equations 

In the example of the ABC model [2] , we have at hand an explicit analytic expres- 
sion for the invariant measure. In fact, our claim is that equations at steady state 
in the thermodynamic limit can be derived by using a method proposed in [7] for 
the square lattice model, which a priori does not require any explicit knowledge of 
the invariant measure, which is most of the time untractable. 

We shall only sketch the main lines of argument in the case of the ABC 
model. The number of sites N is fixed and we are interested in the steady state 
behaviour as t ^ oo. 

For the sake of shortness, let [A, B, C] {(A^, C^), i = 1, . . . , N} denote 

the 3AT-dimensional boolean vector representing the state of the Markov process. 
Combining conditionings and couplings, the approach relies on the construction 
of a stochastic iterative rocking- scheme^ to generate the global invariant measure 
of the system. 

The algorithm constructs a convergent sequence of random kernels n > 

0. 

51 Set n = 0 and take the system in some fixed state [A, B, = [A, B, C]. 

52 At step n, assume we are given a 3-tuple [A, B, corresponding to an 

admissible configuration (with respect to the ABC model). To construct 
[A, B, one allows each particle family to evolve as an independent 

one-dimensional exclusion process until reaching equilibrium, with condi- 
tional transition rates compatible with the rates of the original process 
introduced in section 2.2. For instance, with regard to type A particles 
(omitting sometimes the indexing by n for the sake of brevity)^ we have 



(Xt{i)=p^Bi^r-Ci + TB,Ci, 

\A-(f -h 1) = p"Bi -h r+Q -h r BiCi, 

these expressions being similar to the form of (2) in the {raT^} model, 
r is a non-zero quantity taking into account the exclusion at site i in 
the following sense: when BiCi = 1, then necessarily A^ = 1. In the free 
process, letting gf denote the random variable representing the conditional 
probability of finding a particle A^ at site i, we have 



A+(i) 
Xa {i -\- 1 ) 



.. -I (n) def Qi 

With u) ^ ^ 



Using the boolean nature of Bi and (7^, we can write 



log^ = a“Ci - 



with analogous quantities and for B and C species respectively. 
S3 Clearly, the equilibrium states thus obtained are not all admissible, since at 
a site i, several particles of different types may coexist. Therefore, we only 

retain configurations satisfying the constraint = 




424 



Guy Fayolle and Cyril Furtlehner 



1 , for all the distribution of which, after some algebra, takes the form 
Q([(A, B, I [(^, B, = I n , 

i=l 

where Z is a normalizing constant. Then do n <— n + 1 and go to S2. 

Let T Then, by means of transfer and coupling theorems, 

one can prove the existence of the random vectors 






Fundamental scaling We introduce the so-called fundamental scaling defined by 



= -^ + o( 


' 1 ), 




), a^b^Z^oi 




N ' 


-.Nr 


N \N, 


r N ' 


-.nJ 



where a ,/?, 7 are three positive real constant. 

Limiting equations Set from now on x i/W, for 1 < i < AT. Then one can 
show the weak limits 

= lim p’’{x) - lim rlf^, p‘^{x) = Jim 

N^OO V— voo iV— »-oo 

exist and satisfy the system of deterministic differential equations 

^ = PaiPPc - IPb), 

< ^ ^ PbilPa - apc), (11) 

^ = Pc{apb - I3pa), 

with the crucial constraints due to the periodic boundary conditions 

Pu{x + 1) = pu{x), yu = a,b,c. (12) 

Proof: Starting from (10), one applies the law of large numbers and ergodic 

theorems to justify the approximation of finite sums by Riemann integrals, as 

N 00. 

It is amusing to see that (11) belongs to the class of generalized Lotka-Volterra 
systems. The original Lotka-Volterra model was the simplest model of predator- 
prey interactions, proposed independently by Lotka (1925) and Volterra (1926), 
see for instance [13]. Nonetheless, in our case the world is less cruel and all particle 
types are treated on an equal footing...! 

Now the form of (11) lends itself to a reasonably explicit solution in terms of 
special functions. The first step is to remark the existence of two level surfaces 

pa pb Pc = I 5 (13) 

PaPbPc = ( 14 ) 

where (13) follows at once from (9) and is a constant of motion to be determined. 
Using (13) to eliminate Pc> we rewrite (14) as 

PaPbC^- P<^- PbV = ^- ( 15 ) 




Stochastic deformations of sample paths 



425 



The change of functions u = pa + pb and v = (3pa ~ (^pby yields the first order 
nonlinear diflFerential equation 

fhi 

— =(l-uMu), (16) 

where v{u) satisfies the equation 

{au + vY{(3u — vY{\ - uy = K{a + . (17) 

Formally, u{x) can be expressed as 



X = 




du 

(1 — u)v{u) ’ 



(18) 



It appears that the ratios in (9) are rational for all finite AT, but, since we have 
let iV, AT^ , iV^ , 00 , they might become arbitrary real numbers in the interval 
[0, 1]. When they are rational, (16) is a polynomial equation, and then u{x) is in 
some sense just a bit more general that an hyperelliptic function, since, after some 
algebra, we are left with integrals of the form 

j y^[s 2 - a 2 (l - s)~?]ds, 

where p and q stand for positive integers. The particular case a = (3 = j is rather 
simple, since then 




showing to be a standard Jacobi elliptic function. 



For any given /^, the system (11) admits of a unique solution. In particular, 
there is always a degenerate solution crumbled to the fixed point 



a 



Pa 



a + + 

and corresponding to the constant 



Pb = 



a + /? + 7’ 



pc — 



7 

a + ,3 + 7’ 



(19) 



~ def Q.ct^/3^7 

^ ~ (a + /? + 7 )“+/ 3+7 ■ 

The purpose of the next paragraph is to discriminate between solutions of (11) 
and (12), in order to relate them to admissible limit-points of ^iv, as AT 00 . 



4.2. Stability, fundamental period and phase transition 

To catch a rough qualitative insight into the solution of (11), a standard approach 
relies on a linearization of the right-hand side around the fixed point (19). This 
yields a linear differential system, whose matrix 

0 — 0:7 aj3 ' 

^7 0 -af3 , 

—/3j a'y 0 

has three eigenvalues 0 and X± = 3iiy^aP^{a -h /3 + 7 ), and the trajectories are 
located on an ellipsoid. But, since the above eigenvalues are purely imaginary, it 
is well known that no conclusion can be drawn as for the original system, which 
might be of a quite different nature. However, it is pleasant to see that the modulus 
of the nonzero eigenvalues plays in fact a crucial role as shown in the next theorem. 



1 

a4-/?4-7 




426 



Guy Fayolle and Cyril Purtlehner 



Theorem 4.1. Let s a + /3 -h 7 and r]=^. The limit $ = lim of the ABC 

3 N-^00 

model is deterministic and there is a second order phase transition phenomenon. 
There exits a critical value 

def 27T 

^'s/ PaPbPc 

such that if T] > rjc then there are closed non- degenerate trajectories of (11) satis- 
fying (12), with period T{Kp) = ^, p E {1, . . . , [^]}- The only admissible stable $ 
corresponds 

- either to the trajectory associated with if r]> r]c; 

~ or to the degenerate one consisting of the single point (19) ifrj< rjc^ 

The proof involves a forest of technicalities and we only sketch the main lines 
of argument. The first step is to switch to polar coordinates 

( dei ^ OL 

Ua = pa - - =rCOS 0, 

^ def ^ /? . ^ 

Ub = Pb = r sm 0. 

V s 



Rewrite (14) as 



H{r, 0) ^ log hi, 



H{r, 0) = q; log [r cos 0 H — ] -f ^ log [r sin 0 + -] +7 log [ r (cos 0 + sin 0 )] , 

and let r{0,hi) be the single root in r of (20). Then 0 satisfies the differential 
equation 

^ = G(e,K), ( 21 ) 



where 



G{0, k) - [/3(a + 7 ) cos 0 -\- a(3 sin^ 0 -h a(/3 + 7 ) sin^ 
s 



+ r{0, hi) cos0sm0[{l3 + 2a + 27) cos0 + (a + 2/3 + 27) sin0] . 

Letting T{n) be the period of the orbit, we have 

The second important step relies on the monotonic behaviour of T{hi) with respect 
to the parameter k, yielding the inequality 

T{k) > T{H). 

Observing that 

r{0,H) =0, € [0,27t], 

we can write, by (21), T(k) as a contour integral on the unit circle, namely 

T(k) Ais j)^ z^[){(3 -a) - 2ia(3] + 2z[2al3 4- 7(0; + (3)] 4- 7(/3 - a) 4- 2ia/3’ 
or, after a simple calculus. 




Stochastic deformations of sample paths 



427 



which leads precisely to the critical value r/c announced in the theorem. 



5. Perspectives 



This paper is the continuation of [7], but is certainly an intermediate step. For 
the sake of shortness, we did restrict ourselves to the thermodynamics of the ABC 
model. Actually, our goal is to analyze the dynamics of random curves evolving in 
(no spatial constraints) or in Z!p, when they warp under the action of some 
stochastic deformation grammar. 

In [6] , this project will be carried out in the framework of large scale dynamics 
for exclusion processes, and it will mainly address the points listed hereafter. 

More on thermodynamic equilibrium: The trick to derive limiting differen- 
tial systems amounts essentially to writing conditional flow equations on suitable 
sample paths, even in the presence of particle currents. These equations involve 
functionals of Markov and they enjoy special features encountered in many sys- 
tems. It might also be interesting to note that most of the Lotka-Volterra equations 
can be explained in the light of the famous urns of Ehrenfest. 

Phase transition: There exists a global interpretation by means of a free en- 
ergy functional with two components: the entropy of the system, and the algebraic 
area enclosed by the curve. It turns out that the contention between these two 
quantities yield, after taking limits lim^ ^ oolimjv ^ oo (in that order), either 
stretched deterministic curves or Brownian objects when the scaling is of central 
limit- type. 

Transient regime: Our claim is that time-dependent behaviour can be treated 
along the same ideas, up to technical subtleties, by means of a numerical scheme 
based on the conservation of particle currents. This should yield a system of Burg- 
ers equations, extending those obtained in [7] for the symmetric {rau} model, 
which had the form 



' dp^{x,t) 

i ^ 

dp^{x,t) 

^ dt 






References 

[1] J. Burgers, A mathematical model illustrating the theory of turbulences^ Adv. Appl. 
Mech., 1 (1948), pp. 171-199. 

[2] M. Clingy, B. Derrida, and M. Evans, Phase transition in the ABC model, Phys. 
Rev. E, 67 (2003), pp. 6115-6133. 

[3] B. Derrida, M. Evans, V. Hakim, and V. Pasquier, Exact solution for Id asym- 
metric exclusion model using a matrix formulation, J. Phys. A: Math. Gen., 26 
(1993), pp. 1493-1517. 

[4] M. Evans, D. P. Foster, C. Godreche, and D. Mukamel, Spontaneous sym- 
metry breaking in a one dimensional driven diffusive system, Phys. Rev. Lett., 74 
(1995), pp. 208-211. 

[5] M. Evans, Y. Kafri, M. Koduvely, and D. Mukamel, Phase Separation and 
Coarsening in one- Dimensional Driven Diffusive Systems, Phys. Rev. E., 58 (1998), 
p. 2764. 

[6] G. Fayolle and C. Furtlehner, Stochastic deformations of random walks and 
exclusion models. Part II: Gibbs states and dynamics in Z? and In preparation. 




428 



Guy Fayolle and Cyril Furtlehner 



[7] G. Fayolle and C. Furtlehner, Dynamical Windings of Random Walks and 
Exclusion Models. Part I: Thermodynamic limit in Journal of Statistical Physics, 
114 (2004), pp. 229-260. 

[8] O. Kallenberg, Foundations of Modem Probability, Springer, second edition ed., 

2001. 

[9] M. Kardar, G. Parisi, and Y. Zhang, Dynamic scaling of growing interfaces, 
Phys. Rev. Lett., 56 (1986), pp. 889-892. 

[10] F. P. Kelly, Reversibility and stochastic networks, John Wiley Sz Sons Ltd., 1979. 
Wiley Series in Probability and Mathematical Statistics. 

[11] R. Lahiri, M. Barma, and S. Ramaswamy, Strong phase separation in a model 
of sedimenting lattices, Phys. Rev. E, 61 (2000), pp. 1648-1658. 

[12] T. M. Liggett, Interacting Particle Systems, vol. 276 of Grundlehren der mathe- 
matischen Wissenschaften, Springer- Verlag, 1985. 

[13] J. Murray, Mathematical Biology, vol. 19 of Biomathematics, Springer- Verlag, sec- 
ond ed., 1993. 

[14] H. Spohn, Large Scale Dynamics of Interacting Particles, Springer, 1991. 

Guy Fayolle 

INRIA Rocquencourt - Domaine de Voluceau BP 105 

78153 Le Chesnay, Prance. 

Guy.Fayolle@inria.fr 

Cyril Furtlehner 

INRIA Rocquencourt - Domaine de Voluceau BP 105 

78153 Le Chesnay, Prance. 

Cyril.Purtlehner@inria.fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



A Markov Chain Algorithm for Eulerian 
Orientations of Planar Triangular Graphs 

Johannes Fehrenbach and Ludger Riischendorf 



ABSTRACT: On the set of Eulerian orientations of a planar Eulerian graph 
a natural Markov chain is defined and is shown to converge to the uniform distri- 
bution. For the class of planar triangular graphs this chain is proved to be rapidly 
mixing. The proof uses the path coupling technique of Bubley and Dyer (1997) 
and the comparison result of Randall and Tetali (1998). For the class of planar 
triangular graphs our result improves essentially the mixing rate result from Mi- 
hail and Winkler (1996) for general Eulerian graphs. As consequence we obtain a 
faster polynomial randomized approximation scheme for counting the number of 
Euler orientations. 



1. Introduction 

An undirected, connected graph G = (V, E) is called Euler graph if all vertices 
have even degree. A Eulerian orientation A of G is an orientation of the edges of G 
such that for each vertex v £V the set of edges directed towards v and the set of 
edges directed out of v have the same cardinality, i.e., with E~{v) := {e = (zn, v) € 
X \ w eV} and E'^{v) := {e = {v,w) e X \ w e V} holds: \E'^{v)\ = \E~{v)\, for 
all G F. 

Counting the number of Euler orientations is relevant to some problems in 
statistical physics. Welsh (1990) observed that the crucial partition function of the 
ice-type model is equal to the number of Eulerian orientations of some underlying 
Eulerian graphs. It has also been observed that the counting problem for Euler- 
ian orientations corresponds to evaluating the Tutte polynomial, which encodes 
important information on the graph, at the point (0, —2). It is not difficult to con- 
struct a Eulerian orientation in polynomial time. The corresponding exact counting 
problem of all Eulerian orientations is however #P-complete as was established in 
Mihail and Winkler (1996). Mihail and Winkler (1996) also proved that counting 
of Eulerian orientations of G can be reduced to counting the perfect matchings 
of a related graph G'. Thus by the randomized approximation scheme (RAS) of 
Jerrum and Sinclair (1989) this yields a polynomial RAS for the orientations. The 
mixing time for this scheme is of the class O ((n')^m'(n'logn' + loge“^)), 6 the 
approximation error, n' = \V'\, m! = |P'|, G' = (F', £'). Since by the construction 
of G' in Mihail and Winkler (1996) n' > nm, m' > m? one gets mixing times of 
considerable high polynomial order. 

In this paper, which is based on the dissertation of Fehrenbach (2003), we 
introduce a natural direct Markov chain on the set of Eulerian orientations and 
prove that in the case of planar triangular graphs one gets a considerable lower 
mixing rate order and thus one obtains a faster randomized scheme. A related 
construction and mixing rate result for sampling Eulerian orientations has been 




430 



J. Fehrenbach and L. Riischendorf 



given for bounded Cartesian lattices with specified boundaries in Luby, Randall, 
and Sinclair (2001). 

Our mixing rate result uses the path coupling method of Bubley and Dyer 
(1997). For an ergodic Markov chain M with transition matrix P = (pij) on a 
finite set Q a (Markov-) coupling is a stochastic process (X^, Yt)te]N on Qx ft such 
that for all x^y,z eft and t 



P{Xt+i = X \ Xt = y,Yt = z) = Py^ 
PiXt+i = X \ Xt = y,Yt = z) = 



and Xt = Yt implies Xt+i Then by the coupling lemma 

\\P^^ -pY\\<P{Xt^Yt), ( 1 ) 



where || || is the variation norm. 

Let (Xt, Yt)teJN be a coupling and let 5 x IN be a metric, such that 
for some /3 < 1 and all t 



E{S{Xt+i,Yt+i) I (Xt, Yt) = {x, y)) < j3S(x, y). (2) 



Let r(e) be the mixing time of the Markov chain for the approximation error e, 
then 



'r(e) < 



log((5(f2)e 



1-0 

If jd = 1 and if for some a > 0 , P (S(Xt+i ,Yt+i 
for all t and all a:, y G 0 , then 



if /3 < 1. (3) 

)^6(x,y)l(Xt,Yt) = (x,y))>a, 



^(e) < 



eS(n)^ 

a 



loge \ 



(4) 



where 5(fi) = max{(5(a:, y) \ x,y e ft} is the diameter of ft (see Dyer and Greenhill 
(1998), Aldous (1983)). 

The path coupling method is a technique which simplifies the construction of 
a coupling on all of ft x ft that satisfies condition (2) . It was introduced in Bubley 
and Dyer (1997). The following formulation is from Dyer and Greenhill (1998). 

Let S' C X be a set of transitions such that for all x,y e ft there exists 
a path X = zi^ . . . ^Zr = y fov x to y with transitions (zi^Zi^i) 6 S, Vi < r. 
If (x^y) (X',y') is an M-coupling for all (x^y) G S, then an extension can 

be defined via the path in S for any state {Xt,Yt) = {x,y). One obtains thus 
a sequence and a coupling (Xt+i,yt_^i)t^iN on ft x ft with Xt+i = 

Zq and Ft-fi = Z'. For a function cp : S ^ INo we define a metric S{x,y) := 
min ip{zi,Zi^i), the minimum taken over all paths from x to y in S. If for 
some P <1 



E{6{X', Y') I (X, Y) = (x, y)) < pS{x, y) for all (x, y) e S, (5) 

then 



P(J(X, + 1 , 17 + 1 ) I (X,,y,) = (x,y)) < p5{x,y) for all (x,y) e Q x Q. ( 6 ) 

For the comparison of mixing times ti , T 2 of two ergodic, reversible Markov chains 
Ml, M 2 with the same stationary distribution tt and transition functions pi,P 2 
an effective method has been developed by Randall and Tetali (1998). Let Ti = 
{(x,y) G 0 X I pi{x,y) > 0}, i = 1,2, be the graphs of Mi, M 2 and let F — 
{7x,y; (^?2/) G T 2 } be a set of canonical paths in Pi from x to y for any pair 




A Markov chain algorithm for Eulerian orientations 



431 



(x, y) G T 2 . For any z) G T\ let r(?n, z) = {(rr, y) G T 2 ; z) G 7x,i/} be the set 
of all canonical paths in F which contain the edge {w, z). Finally, we denote by 



^(r) := 



max 



(u),z)eri ' k { u ) px { w , z ) 



^ 7r(x)p2(a:,y)l7x,3/|, 



{x,y)eT{w,z) 



(7) 



where is the length of the path ^x,y^ the comparison measure. Then the fol- 
lowing holds for all € G (O, |): 

Tl(€) < (logTT-^ +log6-^) (8) 

where ^ = mina;^^ ' k { x ). 

This comparison result will be applied in section 3 to determine bounds for 
the mixing time of a natural Markov chain by comparison with a chain which is 
simpler to analyse. 



2. A Markov chain on Eulerian orientations of 
planar graphs 

Let G = (VjE) be a planar, undirected, connected Euler graph and let EO{G) 
denote the set of Eulerian orientations. Let F{G) denote the open domains gener- 
ated by the embedding of G in IR^. The inner domains are bounded while exactly 
one domain — the outer domain — is unbounded. For a G F(G) and a Eulerian ori- 
entation X we denote by the edges in the boundary of a directed according 
to X. The inversion of an edge e = (v^w) ^ X is defined as e := {w^v). Similarly 
for G C A we define the inversion G := {e; e G G}. The following construction of 
a Markov chain Mo(G) = {Xt)tGJN on EO{G) is quite natural. 

Markov chain Mo(G) on EO{G): To define the transition of Mo(G) from Xt = 
X G EO{G), we use two steps: 

1) Let A G F(G) be a randomly sampled domain 

2) If A = a; and with C := x^ define 

{x-C)UC 

Then define 

= { X 

The corresponding transition matrix is denoted by Pq. The inversion of the edges 
in x^ for a G F{G) is called ^-transition. 

Thus the Markov chain randomly chooses domains of the planar graph and 
inverts with probability ^ the orientation of the boundary if possible. 

To determine an initial state So of the Markov chain let G* — (y*,E*) be 
the dual graph with node set V* the set of domains F{G) and where two domains 
a, a' are connected in V* if a, a' have a common boundary in G. For more details 
on planar graphs see Diestel (1996). G* is a bipartite graph. Let V 2 denote the 
partition of V* with corresponding partition Fi, F 2 of F(G). W.l.g. let the outer 
domain of G be in F 2 . To define the initial state Sg of the Markov chain, the edges 
of G are oriented in such a way that for any a e Fi the edges in Sg are oriented 
clockwise. This defines in fact a Eulerian orientation since any node v ^ V in the 
boundary of a is final node of two edges in the boundary of a which are oriented 



if (x - G) U G G EO{G) 
else. 

with probability | 
else. 



(9) 

( 10 ) 




432 



J. Fehrenbach and L. Riischendorf 



clockwise. Thus one edge points into v while the other edge points out of v. For 
each edge towards v thus there exists an edge out of v and thus \E~^{v)\ = \E~{v)\ 
for all G F. Since the domains with v in their boundary belong cyclically to Fi 
or F 2 , the edges of v in Sg are cyclically oriented out of v or towards v (see Figure 
2.1). We use this Euler orientation Sg as initial state of the chain. The analysis 
of the mixing time however will be independent of the initial state. The following 




triangular graph G Sg E EO{G) 



Figure 2.1 The domains with clockwise orientation define Fi . F 2 consists of the domains 
which are oriented counterclockwise. 

property of the initial state Sg will be used in the following. 

Lemma 2.1. If C C Sg is a simple clockwise oriented circle in G, then C* is a 
minimal cut in G* , which decomposes V* in two components and If the 
outer domain of G belongs to , then all nodes in Z* which are final nodes of an 
edge in C* are in V* . 

Proof: The property that C* is a minimal cut is stated in Diestel (1996, Proposition 
3.10). For G F a node in the circle G let k be the number of edges e on i; with 
dual edge e* G Z*. fc is even since by construction of Sg the edges on v cyclically 
are out of v or towards v. Therefore, for the domains a, /3 on both edges on v in 
C with G Zi* holds a*, (3* G F^ (see Figure 2.2). □ 




Figure 2.2 orientation Sg- The edges on v are cyclically in or out v. Thus k is even. 

Similarly, if C as in Lemma 2.1 is counterclockwise oriented then the end 
nodes of edges in Z* are in F 2 *. 

Theorem 2.2. Let G = (F, E) be a planar undirected, connected Euler graph. Then 
the Markov chain Mo(G) defined in (10) is ergodic. The stationary distribution tt 
ofMo{G) is the uniform distribution on EO{G). 

Proof: By the construction of Mq(G) the transition matrix is symmetric po{x, y) = 
Po{y,x) for x,y G EO{G). Thus Mo(G) is reversible with respect to the uniform 
distribution on E{G). By (10) it is also aperiodic. The main part of the proof is to 
establish irreducibility. We will prove by induction on |F(G)| that there is a path 
of any x G EO{G) to the initial state Sg- This implies by symmetry irreducibility 
of Mq{G). 



A Markov chain algorithm for Eulerian orientations 



433 



1) \F(G)\ = 2. 

Then G has two Euler orientations So and So- A transition from Sg to So 
has by (9) positive probability. 

2) Induction step: \F{G)\ =£> 2. 

Let X e EO{G). Let in the first case x = Sg- Choosing in the first construction 
step of 3Vlo(G) iteratively points from the set F\ one gets a finite path from x 
to Sg . The result is independent of the sequence since the inversion of the ori- 
entation of edges on a domain in Fi does not influence edges on other domains 
in Fi. No edge is in the boundary of two domains in Fi (see also Figure 2.1). 

If X ^ Sg then we consider G — {V^E) with E = {e e E : esc ^x] and 
V = {v eV : V is final node of an edge in E}; denote the orientations 

of e by Sg resp. x. We first assume that G is connected and define Sg = 
{esc = G Sg : € F}, X = {e^ = {v,w) e x : {v^w} G E} the 

corresponding orientations of G. Then G is a Euler graph and Sg, x e EO{G), 
Since x ^ Sg have |-F(G)| > |F(G)| and, therefore, from the assumption of 
the induction there is a path {Gi)o<i<k from X = Cq to Sg = Ck w.r.t, Mo(G), 
As consequence one gets also a path from x to Sg in Mo(G). 

Let the transition from Gq to C\ be an a-transition for some a G F{G), 
If a e F{G) then this transition of Mo(G) is also a transition of Mo(G). 
If a ^ F{G) then the edges of x^ are a simple directed circle in i, which 
corresponds to a minimal cut in G*. We denote the component which does not 
contain the outer domain of G* by T* . The edges dual to those in T* are in X 
and in Sg identically oriented. Therefore, all nodes in T* which are final nodes 
of edges in x^ are by Lemma 2.1 in the same partition of G* say in V*. Since 
in Sg the edges in Sq are directed circles for any j3 G F{G), also the edges in 
x'^ are directed circles in x for nodes 7* in T* fl V 2 . Thus starting from x for all 
these 7 the 7-transitions can be made and as a result the edges at all a G F{G) 
with o;* € T* n V2 form circles which get an inverse orientation. Finally the 
orientation of all edges e e E with e* G T* has two times been inverted, while 
the edges in x^ have one time been inverted. Thus the transition of Go to Gi 
in Mo(G) has been transferred to a sequence of transitions in Mo(G). Iterating 
this procedure for all the other transitions we finally obtain a path from x to 

If G has several components, then this procedure can be applied to all 
of these producing finally a path in Mo(G) and establishes irreducibility. Thus 
the Markov chain Mo(G) is ergodic and the uniform distribution of EO{G) is 
the unique stationary distribution of Mo(G). □ 



3. Rapid mixing of Mo(G) for triangular graphs 

The aim of this section is to prove the rapid mixing property of the Markov chain 
Mo(G) of section 2 for the case of planar, triangular graphs. A graph G is called a 
triangular graph if the boundary of any inner domain contains exactly three nodes 
(cf. the example in Figure 2.1). Let ro(e) denote the mixing time of Mo(G) for 
approximation error s > 0. 




434 



J. Fehrenbach and L. Riischendorf 



Theorem 3.1. Let G = (V,E) be a Eulerian, planar triangular graph. Then the 
mixing time ro(e) of the Markov chain Mo(G) is polynomially bounded and 

'To(e) < 3072n^(loge“^)(3n + log6~^). (11) 

To establish the bound in (11) we introduce a modified Markov chain for 
planar Eulerian triangle graphs with extended transitions which is easier to analyse 
(see also the proof of Theorem 3.5) 

Lemma 3.2. Let G = {V, E) he a planar, Eulerian triangular graph and x G EO{G). 
If a E F{G) and is not a directed circle, then a has at most one neighbour 
domain (3 G F{G) such that ^x^ is a directed circle. 

Proof: If x^ is not a directed circle, then there is an edge e E x^ such that 
{x^ — {e}) U {e} is a directed circle. This edge e is not contained in any simple 
directed circle in x together with any of the other two edges form x^. Therefore, 
only the neighbour domain l3, separated from a by the edge e, can possibly satisfy 
that x" 0 is a circle, see Figure 3.3 for the corresponding part of G. 




Figure 3.3 



□ 



If for a Eulerian orientation x it is not possible to invert the edges of a domain 
a then there can be one (but at most one) neighbour domain /3 such that x^ 0 x^ 
is a directed circle. The following modification Mo(G) of Mo(G) allows the Markov 
chain to invert circles of this kind. 



Definition 3.3. Let G = {V, E) be a planar Eulerian triangular graph. The Markov 
chain Mq(G) = on EO{G) is defined by the following transitions of 

M[)(G) from Xt = x E EO{G): 

1 ) Let A G F{G) be a randomly sampled domain 

2) For A = a and if C := x^ is a directed circle then define x' := (x — G) U C . 
If C is not a directed circle and if (3 E F{G) is a neighbour domain of a such 
that G' = x" 0 x^ is a directed circle, then set 



{ (x — G') U G' with probability | 
X else 



( 12 ) 



If no neighbour (3 of this type exists then set X' x. 

3) Define := | f ProhabilUy \ 



Let Pq = (Po(t)) denote the transition matrix o/ M'(G). If X' ^ x 
in the construction above then we speak of transitions of type 1 and type 2 
corresponding to the use of simple circles or combined circles in step 2). 

Theorem 3.4. Let G — {V,E) be a Eulerian, planar, triangular graph. Then the 
Markov chain Mg(G) is ergodic and the stationary distribution tt o/Mq(G) is the 
uniform distribution on EO{G). 




A Markov chain algorithm for Eulerian orientations 



435 



Proof: Obviously Mo(G) is irreducible and aperiodic since Mo(G) has these 

properties. To prove reversibility, let x,y e EO{G), x ^ y with p'^{x,y) > 0. 
Then pQ{x^y) = pQ{y^x) by the proof of Theorem 2.2 for those pairs x,y with 
Po{x^ y) > 0. If y) = Oj then for some neighbour domains a, jS holds, that x^ 
is not a directed circle and y = (x — C)l) C with C = x^ ^ x^ . We have that 
Pq(x, y) = Thus we have a situation as in Figure 3.4. One sees directly 





that x^ and y^ are directed circles while x^ and y^ are not directed circles. If in 
one step transition of the Markov chain Mo(G) one draws in step 1) (3 then in 
step 2) one obtains G' = 0 y^ and thus Y' = x with probability Together 

we obtain p'o{y,x) = ^ = p'o{x,y). 

The mixing time Tq of Mo(G) can be efficiently bounded for this Markov 
chain in the case of triangular graphs. 

Theorem 3.5. Let G = (Y^E) be a Eulerian, planar triangular graph. Then the 
mixing time Tq of'M!^{G) is polynomially hounded inn = \V\: 

'To(e) < [32en^] [log 6“^] for all e G (0, 1). (13) 

Proof: We apply the path coupling method with 

S := {(x, 2 /) € f] X : 3 inner domain a such that y = {x — x^) U x^}, 

where Q = EO{G). The irreducibility of Mq{G) implies that for any (x,y) G 
there exists a path (zi)o<z<r with {zi, Zi^i) G S for all i < r and zq = x, Zr = y. 
Define <p{x, y) = 1 for all (x, y) G S. Then 5(x, y) is the length of the shortest path 
from X to y in S. We have to determine the coupling on S. 

For (xi,yi) G S with yi = (xi — xf ) U xf for some inner domain a G F{G) 
let /3,7, A denote the neighbour domains of a. To construct the coupling (AT 2 ,T 2 ) 
starting from (Ai,yi) = {xi,y\) choose in step 1 for both pairs the same inner 
domain n G F{G). After construction of X 2 the coupling transition step of y\ is 
given as follows: 

1. case: ti ^ {a,yS, 7 , A}. 

In this case holds x^ = i/f as well as x^ 0 x^ = 2/i 0 ?/i for all neighbours k! 
of X. Therefore, in step 2 for the transition of x\ w.r.t. Mo(G) we have the same 
situation as for yi and thus may choose a transition from yi to y 2 parallel to that 
of xi to X2. 2/2 differs from X2 only in the orientation of the edges in xf . Thus we 
obtain 

{X 2 ,y 2 ) e s and S{x 2 ,y 2 ) = S{xi,yi) = 1. (14) 

2. case: k = a. 

Then by step 2) we obtain X[ = yi and Y{ = x\. Defining 

r (a;i,a;i) with probability ^ 

^ I ( 2 / 1 , 2 / 1 ) with probability i 




436 



J. Fehrenbach and L. Riischendorf 



we obtain a.s. 

5(X2,y2) = <5(Xi,J/i)-l = 0. 
There are two extreme situations to consider. 



( 15 ) 



3. case: k E {/3, 7 , A}. 

Without loss of generality let n = (5. The first possible extreme situation is this: 
Xj is no directed circle, but for some neighbour (3' of j3, P' ^ a, C = ^ is 

an oriented circle. Also assume that yf is no directed circle but C" = yf © yf for 
some neighbour P” of /3, P” 0 {a,/3'} is a directed circle. Then define 



[X2.Y2) = 



{x'l , y[ ) with probability 
1 5 y 1 ) with probability ^ , 



12 ’ 

11 



where x^ = {Xi 
obtain 



C') U C', y[ = {Yi - C") U C". Then for this worst case we 



5(^2, T2) 






, yi) + 4 with probability 



12 ’ 

11 



^ (^1 5 yi ) with probability . 

Since <5(xi,yi) = 1 we obtain from (14)-(16): 

E{S{X 2 ,Y 2 ) I (Ai,Ti) - (xi,yi)) - 5(xi,yi) 



(16) 



(17) 



< 



\HG)\ 

1 



1 



(|F(G)|-5).0 + l.(-l) + 3- 



12 



^(-1 + 1) = 



0 



\F{G)\ 

and thus E{ 5 {X 2 ,Y 2 )\{Xi,Yi) = (xi,yi)) < 6 {xi,yi). Note that this situation 
allows to choose the transition of type 2 in Definition 3.3 with probability at most 



j_ 

12 - 



S{X2,Y2) = 



with probability X 
with probability ^ ^ 

with probability | 



The second extreme situation to consider is this: x^ is no directed circle but 
C = xf 0 x“ is a directed circle. Simultaneously yf is a directed circle. Then the 
transitions of type 2 in x\ and of type 1 in yi yield the same result x^ = yi\ the 
first transition occurs with probability the second with probability As a 
result we obtain in this second situation 

(5(xi,yi) - 1 
5(xi,yi) + 1 
5(xi,yi) 

and, therefore, 

E{ 6 {X 2 ,Y 2 ) 1 (Xi,Fi) = (xi,yi))-5(xi,yi) 

V(G)|-5).0+1.(-1) 

+ = 

In this second situation the transition of type 2 has to be chosen with prob- 
ability at least So both of these situations lead to the transition probability 
■X, This second situation is also the reason that the original Markov chain has to 



1F(G)1-1 



1 



+ : 




A Markov chain algorithm for Eulerian orientations 



437 



be modified by allowing transitions of type 2. For the Markov chain Mo(G) the 
contraction condition is not fulfilled in this situation. 

As a result we obtain that the assumption of the path coupling theorem in 
(5) is fulfilled with /? = 1 and thus we obtain a coupling on Q x h. 

In the proof of Theorem 2.2 it was shown that any x in Q is connected with 
Sg by a path of length < 1E(G)| — 1. Thus we obtain (S(Q) = max^^ ^(^,2/) < 
2(|F(G)| - 1). 

Prom case 2 we obtain for {xi,yi) G S 

a := P{ 6 (X 2 ,Y 2 ) 5(xi,2/i) | (Xi.Yi) = (xi,j/i)) > (|F(G)| - l)-\ (18) 

This estimate extends by an induction argument to any pair (xi,yi) G 0,^. Thus 
(4) implies 

r'(e)<[4e(|F(G)|-l)3]rioge-^. (19) 

If a denotes the number of edges in the outer domain of G, then using that G is 
triangular we obtain with n = \V\, m = \E\, i = \F{G)\ 

3{e-l)+a = 2m. (20) 

The Euler polyeder formula for planar graphs gives 

n-m-\-i = 2. (21) 

Thus we obtain n + i - ^{£ — 1) — | = 2. This implies 

i<2n (22) 

and again by Euler’s polyeder formula 

m < in. (23) 

Prom (19) we conclude 

To(e) < [4e^^]floge“^] < [32en^][log€“^]. 

From the polynomial bound for the mixing time Tq of Mo(G) we next derive 
the mixing time bound for tq by means of the comparison method of Randall and 
Tetali (1998). 

Proof of Theorem 3.1: In order to compare the mixing time ro{e) with ro(e) from 
Mo(G) we have to construct a set P of canonical paths in Mq(G) for each 
transition (x,y) in Mq(G), P = {jx.y] (x^y) a transition in Mo(G)}. 

1. case: (x,y) is a type 1 transition. 

Then for some a G F{G), y = {x — C)UC with C = x^. This is also a transition 
in Mo(G) and thus we define 7x,y = (x,y). 

2. case: (x, y) is a type 2 transition. 

Then y = {x — C) U C with G = x" 0 for some inner neighbour domains 
a,/3 G F{G). G is a directed circle while x^ is not a directed circle. Therefore, x^ 
is also a directed circle. Therefore, the type 2 transition (x, y) can be replaced by 
two type 1 transitions, first in the domain (3 and then in the domain a. These two 
transitions define the path Together we obtain the set of canonical paths P. 

We have to bound the comparison measure A(P). Let (ry, z) G Ti be a tran- 
sition of Mo(G), i.e., z = {w — C)\JC, C = for some inner domain a G F{G). 
If /3 is a neighbour domain to a, then and z^ can not be both directed circles 
since one common edge of these transitions is inverted. Therefore, the transition 
{w, z) is contained for each neighbour /? of a in at most one path 'y^^y G P and 




438 



J. Fehrenbach and L. Riischendorf 



the transition (x, y) in Mq{G) is then of type 2. Thus we obtain \T{w, z)\ < 4 and, 
therefore, 



^(r) 



< 



max 

(w,z)£Ti 



E 

{x,y)er{w,z) 



P'o{ 3 :,y) 

Po{w,zf^’^ 



< max 2|r(u;, 2 :)| < 8. 

{w,z)eTi 

Since the stationary distribution tt of Mo(G) is the uniform distribution on EO{G) 
we obtain 7 t“^ < 2^ with m = \E\. Also by (23) m < 3n and thus < 2^'^. 
This implies by the comparison result in (8): 



'^o(e) < 
< 

for all e € (0, |). 



41og((7fe) 

log((26)-i) 






log(23^^0 

- '^^og((2e)-i) 



[32en®][loge 



3072n^ [log € (3n + log e ^ ) 



□ 



Remarks 3.6. a) The mixing rates Tq, tq are given in Theorems 3.1 and 3.5 only 
in terms ofn= \V\ the number of nodes. Tq is of the order while tq is of the 
order . The effective mixing time of the chains could be much lower since the 
estimation by the path coupling method is uniform and thus too much concen- 
trated on the worst case. In comparison our rates are much better than the rates 
obtained by the general method of Mihail and Winkler (1996) which are of an 
order > In consequence the standard associated randomized approximation 
scheme (RAS) of our specialized Markov chain yields improved estimates for 
the number of Euler orientations. 

b) It is an open problem whether the chain Mo(G') in section 2 is rapidly mixing 
for all (or a large class of) planar graphs. The rapid mixing property of this 
chain has been established also for the Euler orientations in Cartesian grids in 
the dissertation of Fehrenbach (2003). 



References 

[1] Aldous, D. (1983). Random walks on finite groups and rapidly mixing Markov chains. 
Volume 986 of Lecture Notes in Mathematics, pp. 243-297. Springer. 

[2] Bubley, R. and M. Dyer (1997). Path coupling: a technique for proving rapid mixing 
in Markov chains. In Proceedings of the 38th Annual IEEE Symposium on founda- 
tions of Computer Science (FOCS), pp. 223-231. IEEE Computer Society Press. 

[3] Diestel, R. (1996). Graphentheorie. Springer. 

[4] Dyer, M. and C. Greenhill (1998). A more rapidly mixing Markov chain for graph 
colorings. Random Structure and Algorithms 13, 285-317. 

[5] Fehrenbach, J. (2003). Design und Analyse stochastischer Algorithmen auf komhina- 
torischen Strukturen. PhD thesis, University of Freiburg. 

[6] Jerrum, M. and A. Sinclair (1989). Approximating the permanent. SIAM Journal 
on Computing 18, 1149-1178. 

[7] Luby, M., D. Randall, and A. Sinclair (2001). Markov chain algorithms for planar 
lattice structures. SIAM Journal on Computing 31, 167-192. 

[8] Mihail, M. and P. Winkler (1996). On the number of Eulerian orientations of a graph. 
Algorithmica 16, 402-414. 




A Markov chain algorithm for Eulerian orientations 



439 



[9] Randall, D. and P. Tetali (1998). Analyzing Glauber dynamics by comparison of 
Markov chains, Volume 1380 of Lecture Notes in Computer Science, pp. 292-304. 
Springer. 

[10] Welsh, D. (1990). The computational complexity of some classical problems from 
statistical physics. In G. Grimmett and J. Hammersley (Eds.), Disorder in Physical 
Systems, pp. 307-321. Oxford University Press. 

[11] Welsh, D. (1998). Complexity: Knots, Colourings, and Counting, Volume 186 of LMS 
Lecture Note Series. Cambridge University Press. 

Johannes Pehrenbach 

Department of Mathematics, University of Freiburg, Eckerstr. 1, 79104 Freiburg, 
Germany 

Ludger Ruschendorf 

Department of Mathematics, University of Freiburg, Eckerstr. 1, 79104 Freiburg, 
Germany 

ruschen@stochastik.uni-freiburg.de 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Regenerative Composition Structures: 
Characterisation and Asymptotics of 
Block Counts 



Alexander Gnedin 

(based on joint work with Jim Pitman and Marc Yor) 



A regenerative composition structure is a sequence (C^) of random composi- 
tions of integers n = 1, 2, . . . which satisfies two conditions: 

• sampling consistency: if n identical balls are distributed into an ordered 
series of boxes according to (Cn), then a distributional copy of 6n-i is 
obtained by discarding one of the balls picked uniformly at random, and 
then deleting an empty box in case one is created. 

• the first-part deletion invariance: for all n > r > 1, given that the first 
part of is r, the remaining composition of n — r is distributed like 6n-r* 

The first condition is an ordered version of the defining property of Kingman’s par- 
tition structures, while the second generalises a deletion property of the partition 
structure associated with the Ewens sampling formula (ESF). Next representation 
is the principal characterisation result. 

Theorem 14. [3] For a measure u on ]0, 1] with finite first moment and d>0 let 
: r) = J x'^{l — x)'^~^u{dx) , 1 < r < n, $(n) = : r). 



Then for 



q{n : r) 



the function on compositions 



^(n : r) 
$(n) 



e 

p{Xi,...,Xe) = ; Xj) 



where Aj = Aj + . • . + is the distribution of a regenerative composition structure. 
The parameterisation of distributions by (z/, d) is unique subject to the normalisa- 
tion $(1) = 1. 



There is a paintbox-type sampling scheme which constructs a composition 
structure associated with Let (St) be a subordinator with drift d and the 

Levy measure obtained from u by the transform x — log(l — a:). Let ^ C [0, 1] be 
the closed range of the process St = 1— exp(— 5t), and let ni, ^ 2 , . . . be independent 
sample from the uniform distribution. Call Ui, uj blocked if the interval spanned 

on these points does not intersect "R. The composition Cn appears as the record of 
clusters of blocked points among the first n sample points, from left to right. 




442 



Alexander Gnedin 



There is a direct characterisation of regenerative composition structures in 
terms of arrangements of special (unordered) partition structures. Call a nonnega- 
tive function d(A, r) depending on partition A and a part r G A a deletion kernel if 
~ ^ such a kernel defines a subpartition of A obtained by 

deleting a random part. Call a partition structure (T^) deletion invariant if for all 
n > r >1, given that the deleted part of is r, the remaining partition of n — r 
is distributed like CPn-r* Starting with some partition A, repeated application of 
the deletion rule encoded in d yields an ordering of the parts of A. 

Theorem 15. [4] For a partition structure with distribution p, invariant with respect 
to some deletion kernel d, the arrangement of parts in the order of deletion yields 
a regenerative composition structure. The distribution of such a partition structure 
is described then by the formula 

p(Ai, . . . , A^) = • ^a(j)) 

(7 j = l 

where the sum expands over all distinct compositions (A^(i), . . . , A^(^)) derived from 
partition A by arranging the parts in some order, and 

q{n:r)= d{X,r)p{\). 

{Ahn; r6A} 



Particular choice of the measure u[x, 1] = x~^{l — x)^ with a G [0, 1], 0 > 0 
yields a regenerative composition structure whose associated partition structure 
belongs to the Ewens-Pitman two-parameter family (and for other values of param- 
eters no regenerative realisation is possible). For example, the case a = 0, 0 > 0 
corresponds to the GEM paintbox, the case o > 0, 0 = 0 to the paintbox induced 
by the zero set of a Bessel process, and the case a = ^ is associated with a Bessel 
bridge (in the last case there is a further characterisation by reversibility or by the 
invariance of distribution of 6^ under permutations of parts). Partition structures 
from the two-parameter family are deletion-invariant with respect to the kernel 

1 (n — r)r H- r(l — r) 
n 1 — T — l)r 

(the formula distinguishes among the equal parts), and the deletion property is 
characteristic (the ESF being a special case r = 0). 

A novel example of composition structure is provided by the formula [1] 



!>w=^n 



(»]„ V A,ft(A,) ’ 



where h(r) = 






which corresponds to a gamma-like subordinator with u{dx) = x-\i-xy-\e> 

0. 

Let Kn be the number of parts of and let be the multiplicity of part 
r. Next is a summary of available results regarding these functionals. Suppose 
d = 0. When iy is finite, under very mild assumptions the distribution of is 
close to normal, with both expectation and variance of the order logn [2], and 
Kn^r is bounded for each r as n ^ oo. When J'[x, 1] ~ ^(l/x)x“^ as x | 0 
for slowly varying ^ and 0 < a < 1, we have Kn/{i{n)n^) strongly converging 
to the exponential functional of a subordinator (the case o = 1 is special), and 
a similar joint convergence result holds for Kn^s [5]. A very interesting case. 




Regenerative Composition Structures 



443 



intermediate between the above two, is the case of gamma-type subordinators with 
u[x^ 1] — logrr: in this case is again asymptotically normal, with expectation 

and variance growing like log^n and log^n, respectively, and ^n, 2 , • • •) is 

jointly approximately normal [6]. 

References 

[1] A. V. Gnedin. Three sampling formulas, Comb. Prohab. Comp., 13: 185-193, 2004. 

[2] A. V. Gnedin. The Bernoulli sieve, Bernoulli, 10: 79-96, 2004. 

[3] A. V. Gnedin and J. Pitman. Regenerative composition structures, Ann. Probab., 
2004 (to appear) 

[4] A.V. Gnedin and J. Pitman. Regenerative partition structures, (paper in progress) 

[5] A. V. Gnedin, J. Pitman and M. Yor. Asymptotic laws for compositions derived from 
transformed subordinators, available at arXive. 

[6] A. V. Gnedin, J. Pitman and M. Yor. Asymptotic laws for regenerative composition 
structures: gamma subordinators and the like, available at arXive. 

Alexander Gnedin 

Utrecht University, The Netherlands 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Random Walks on Groups With a Tree-Like 
Cayley Graph 

Jean Mairesse and Frederic Matheus 

ABSTRACT: We consider a transient nearest neighbor random walk on a 
group G with Unite set of generators E. The pair (G, E) is assumed to admit a nat- 
ural notion of normal form words which are modified only locally when multiplied 
by generators. The basic examples are the free products of a finitely generated 
free group and a finite family of finite groups, with natural generators. We prove 
that the harmonic measure is Markovian and can be completely described via a 
finite set of polynomial equations. It enables to compute the drift, the entropy, 
the probability of ever hitting an element, and the minimal positive harmonic 
functions of the walk. The results extend to monoids. In several simple cases of 
interest, the set of polynomial equations can be explicitly solved, to get closed 
form formulas for the drift, the entropy,... Various examples are treated: the mod- 
ular group the Hecke groups Z/2Z^Z/fcZ, the free products of two 

isomorphic cyclic groups Z/fcZT*rZ/fcZ, the braid group Bs, and Artin groups with 
two generators. 



!• Introduction 

The properties of the harmonic measure associated with a random walk on a 
finitely generated free group have been studied by many authors [6, 14, 16, 21]. A 
remarkable result is that for nearest neighbor random walks, the harmonic measure 
is Markovian. If the probability defining the random walk does not only charge 
the generators but is of finite support instead, then the result fails to be true, see 
[14]. In other words, having a Markovian harmonic measure is a property which 
depends not only on the group G but also on the chosen set of generators E. Here, 
we prove the existence of a Markovian harmonic measure for a whole class of pairs 
{G, E). We then use the property for computational purposes. 

A pair (G, E) formed by a group (group law *, unit element 1g) and a finite 
set of generators is called 0-automatic if the set of words L{G, E) = {u\ • • ■ [ 

Vz, Ui * Ui-^i ^ E U 1g} is a cross-section of G. 

Consider a group G = F(5) ★ Gi ★ • • • ★ G/c which is a free product of a 
finitely generated free group and a finite family of finite groups, also called plain 
group. Consider the natural (but not necessarily minimal) set of generators E = 
SuS~^UiGi\{lGi}- Then the pair (G, E) is 0-automatic. Now consider an arbitrary 
0-automatic pair (G, E). Then G is isomorphic to a plain group. This is not the 
end of the story since E may be different (in particular, larger) than the natural set 
of generators of the group seen as a free product. (For an example of this, see the 
end of §3.) And what is relevant in our context is the pair group-generators rather 
that the group itself, since the generators form the support of the measure defining 




446 



Jean Mairesse and Frederic Matheus 



the nearest neighbor random walk. Hence it seems appropriate and convenient to 
coin a specific term as “0-automatic pairs” to highlight the notion. 

Apart from the free group with one generator and the group Z/2Z^Z/2Z, all 
the plain groups are non-amenable. It follows that any random walk living on the 
whole group is transient and has a strictly positive drift (see [11] and [26, Chapter 
l.B] for details). 

Consider a transient nearest neighbor random walk {Xn)n on a 0-automatic 
pair (G, E). Let (l^)n be the corresponding random walk on normal form words. 
The harmonic measure is the law of Yoo = hi^n We prove that this harmonic 
measure is Markovian. The transition probabilities of this Markov chain are the 
unique solutions of a set of polynomial equations of degree 2, that we call the 
Traffic Equations. We can then compute the drift and the entropy of the random 
walk. Some additionnal work is needed to compute the probability of ever hitting 
a generator, which leads to a computational description of the minimal positive 
harmonic functions. Mutatis mutandis, the results extend to monoids. 

All the random walks considered here belong to the general setting of random 
walks on regular languages studied in [15]. In [15], local limit theorems are proved. 
Also, our random walks can be viewed as random walks on a tree with finitely 
many cone types in the sense of [19], but with one-step moves at distance 1 and 2. 
In [19] , it is proved, among other things, that the harmonic measure is Markovian 
for nearest neighbor random walks. See also [23]. The method of proof is different 
from the one we use, see the discussion in §5.2. 

A natural question is whether the Traffic Equations can be solved “explicitly” , 
in order to get for instance a closed form formula for the drift or the entropy. This 
is feasible in many situations. We treat completely the following cases: the general 
nearest neighbor random walk on the modular group Z/2Z:*:Z/3Z, three one- 
parameter families of random walks on Z/3Z^Z/3Z, a one-parameter family of 
random walks on the three strands braid group B3, and the simple random walks 
on Z/fcZ^Z/A;Z, on the Hecke groups Z/2Z^Z//cZ, and on the Artin groups with 
two generators. 

None of these computations appeared in the literature before. For the few 
examples of non-elementary explicit computations previously available, see [5, 6, 
16, 19, 20, 21]. Nevertheless, there exists an alternative potential method for the 
effective computation of the drift (not the entropy) which is due to Sawyer and Ste- 
ger [21]. In this approach, the drift is expressed as a functional of the first-passage 
generating series of the random walk. The simplest of our computational results 
can also be retrieved using this method. We detail and discuss this method in 
§6.1. The Sawyer-Steger method links the problem of computing the drift with the 
problem of computing the generating series of transition probabilities. Concern- 
ing the latter problem, there exists an important literature, especially for random 
walks on free groups and free products, see [2, 3, 25] and [26, Sections II.9 and 
III. 17]. 

The usual motivation for studying generating functions of transition proba- 
bilities is to get central or local limit theorems. For much more material on this 
and on other aspects of random walks on discrete infinite groups only touched 
upon here (like boundary theory), see [13, 24, 26] and the references there. 

The results, presented here without proofs, are exposed in full details in 
[17, 18], 




Random walks on groups 



447 



Let N be the set of non-negative integers. If // is a measure on a group (G, *), 
then is the n-fold convolution product of /i, that is the image of the product 
measure by the product map G x • • • x G ^ G, (^i , . . . , Qn) ^ 5^1 * ^2 * * • * * 
The symbol U is used for the disjoint union of sets. 



2. Random walks on groups 

Given a set E, the free monoid it generates is denoted by S*. As usual, S is called 
the alphabet, the elements of S and E* are called respectively the letters and words. 
The empty word is denoted by 1^*. The length (number of letters) of a word u is 
denoted by \u\^. 

Consider a finitely generated group (G, *) with unit element Let E be a 
finite set of generators of G (with 1 g ^ ^ and u G E => u~^ G E). Denote by 
7T : E* -> G the monoid morphism which associates to a word ai • • • the group 
element ai * • • • * a^. The length with respect E of a group element u is: 

|u|e = min{fc | ix = si * • • • * G E} . (1) 

A word u G E* is a geodesic if |u|e = |7r(u)|x;. 

The Cayley graph X(G, E) of a group G with respect to a set of generators 
E is the directed graph with G as the set of nodes and with an arc from u to u if 
u~^ * u G E. It is often convenient to view X(G, E) as a labelled graph with set 

of labels E (with u v a = v). Observe that lixls is the geodesic distance 
from Iq to u in the Cayley graph. 

Let Gi 'k G2 be the free product of two groups G\ and G2. Roughly, the el- 
ements of Gi ★ G2 are the finite alternate sequences of elements of Gi\{1gi } and 
C2\{ 102)5 and the group law is the concatenation with simplication. More rigor- 
ously, the definition is as follows. Set 5 = Gi U G2. Let ~ be the least congruence 
on 5* containing the relations: Mu,v G 5*,Vx G {1,2}, ulciV ~ uv, \!a,b,c G 
Gi, s.t. c — akb, uabv ~ ucv. The quotient monoid (5*/ is a group called the 
free product Gi G2. 

Let /X be a probability distribution over E. Consider the Markov chain on 
the state space G with one-step transition probabilities given by: \/g £ G,Wa e 
Pg, 9 *a = This Markov chain is called the random walk (associated with) 
(G, /x). It is a nearest neighbor Tandom walk: one-step moves occur between nearest 
neighbors in the Cayley graph X(G, E). When /x(s) = 1/|E| for all s G E, we say 
that the random walk is simple. 

Let {xn)n be a sequence of i.i.d. r.v.’s distributed according to /x. Set 

Aq — 1, ~ * Xyi = Xq * X\ * • • * * Xn . (2) 

The sequence {Xn)n is a realization of the walk (G,ill). The law of Xn is 
Drift, entropy. The first step in understanding the asymptotic behaviour of Xn 
consists in studying the length |An|s as n — > oo. Since \ uv \ y : < |xx|z: + ^Ie, 
Guivarc’h [11] observed that a simple corollary of Kingman’s subadditive ergodic 
theorem is the existence of a constant 7 G M+ such that a.s. and in L^, for all 
1 < p < 00, 

lim |X„|s/n = 7. (3) 

n— >oo 

We call 7 the drift of the random walk. Intuitively, 7 is the speed of escape to 
infinity of the walk. 




448 



Jean Mairesse and Frederic Matheus 



Another quantity of interest is the entropy. The entropy of a probability 
measure p with finite support S is defined by H{p) = — The 

entropy of the random walk (G, /i), introduced by Avez [1], is 

h = lim = lim - - log /x*” (X„) , (4) 

n n n n 

a.s. and in for all 1 < p < oo. The existence of the limits as well as their 
equality follow again from Kingman’s subadditive ergodic theorem. 

3. Zero-automaticity 

Let G be a group with finite set of generators E. A language L of E* is a cross- 
section of G (over the alphabet E) if the restriction of tt to L defines a bijection, 
that is if every element of G has a unique representative in L. A word of L is 
then called a normal form word. The map (f : G ^ L which associates to a group 
element its unique representative in L is called the normal form map. 

Definition 3.1. Let G be a group with finite set of generators E. Define the sets 

Va G E, Next{a) = {b G E | 0 Eu{1g}}, Prev{a) = {b G E | 6*a ^ EU{1 g}}- 

(5) 

and the language of E* , 

L(G, E) = {ui • • • it/c I Vi G {2, . . . , fc}, Ui e Next{ui-i)} (6) 

= {ui--Uk\yi e k-1}, Ui e Prev(ui+i)} . 

We say that the pair (G,E) is 0-automatic if L{G,T,) is a cross-section of G. 

Here are some consequences of Definition 3.1. First, let : G — > T(G, E) be 
the normal form map. Then: Mg ^ G s.t. (p{g) = Ui- -Uk, Va G E, 

{ ui’-Uk-i ifa = a^^ 
ui'"Uk~iv if afc * a = u G E , (7) 

• • • Uk-iUka if a/c * a ^ E U 1g 
and the analog for ip{a^ g) also holds. 

Second, the language L(G, E) is regular and recognized by the following au- 
tomaton: Set of states: E U 1g? initial state: 1 g, final states: E U 1g; Transitions: 

a 6 if a * 6 ^ E U 1g- 

Third, the Cayley graph X(G, E) has a tree-like structure. In particular, 
X(G, E) has uniform node-connectivity 1, that is, the removal of any node dis- 
connects the graph, see Figure 6. 

Fourth, all the group elements have a unique geodesic representative with 
respect to E, and the set of these geodesic representatives is precisely L(G, E). 

Fifth, the set of simple circuits going through the node 1 g in X(G, E) is 
finite. In fact, this last property is equivalent to the property that L(G, E) be a 
cross-section, see [12]. 

Here are examples of 0- automatic pairs: 

• Let G be a finite group. Then (G, G\{1 g}) is 0-automatic. 

• Let F(E) be the free group generated (as a group) by E. Set E = E U E“^. 
Then (F(E),E) is 0-automatic. 

• Let (Gi,Ei) and (G 2 ,E 2 ) be 0-automatic. Then (Gi ^G 2 ,Ei U E 2 ) is 0- 
automatic. 




Random walks on groups 



449 



a.ab a.aba ab.b ab.ba 





Figure 3.1 The Cayley graph of G with respect to {a, a 6, 6 (left), and {a, a 6a} 
(right). 



Following [12], define a plain group £ls the free product of a finitely generated 
free group and a Wte family of finite groups. Let G = ¥{S) ★ Gi ★ • • • ★ be 
a plain group. Then T. = S U Ej = G^\{1 gJ, is a set of generators 

for G that we call the natural generators. It follows from the above that (G, E) is 
0-automatic. The sets Next(-),Prev(-), defined in (5), can be explicited: 

Va G Next(a) =Prev(a) = E\{a~^}, Va G E^, Next(a) = Prev(a) = E\E^. 

A 0 -automatic pair (G, E) is precisely a unique factorization pair in Stallings 
[22], with the additional assumption that E be finite. Using the results from [22], 
we get: 

Proposition 3.2. Let (G, E) be a 0-automatic pair. Then G is isomorphic to a plain 
group. 

The proof of Stallings is constructive and provides more precise information. 
Let if be any isomorphism from G to the plain group G. Then there exists a set 
S of natural generators of G such that S C ^(E). Note that the inclusion may be 
strict. Also, for u G </?(E), we have \u\s = 1,2, or 3. 

Plain groups are hyperbolic in the sense of Gromov [10] and automatic in 
the sense of Epstein & al [7]. Besides, (G, E) is 0-automatic iff (E,L(G, E)) is an 
automatic pair (in the sense of [7]) satisfying the 0- fellow traveller property. This 
is our justification for the chosen terminology. 

Consider the group G = ( a, 6 | abab = 1 ). (This is the Artin group A 4 = 
( a , 6 I abab = baba ) quotiented by its center.) Set E = {a, 6 , a 6 = {ab)~^^ba = 
{ba)~^,aba = b~^,bab = a~^}. Then (G, E) is a 0 -automatic pair. Here, Next(x) = 
{a, ab, aba} if x G {a, 6 a, a 6 a} and Next(x) = { 6 , 6 a, bab} if x G { 6 , a 6 , 6 a 6 }. 

Now, the group G is isomorphic to F(a) ★ {1, 6 a} ~ Z ★ Z/2Z, see Figure 3. 
Set S = {a,a“^ = 6 a 6 , 6 a} for the corresponding set of natural generators. We 
have for instance \b\s = 2 and |a 6|5 = 3. Concentrating on the right of Figure 3, 
it is not obvious that (G, E) is 0 -automatic. 




450 



Jean Mairesse and Frederic Matheus 



4. Random walks on zero-automatic pairs 

Prom now on, the setting is the following one. Let G be an infinite group with 
a finite set of generators E, with S = E~^, such that (G, E) is 0-automatic. Let 
L(G,E) and (p be defined as in (6) and (7). Let /i be a probability on E which 
generates the whole group, that is, UnSupp(/x*’^) = G, where supp is the support 
of the measure. We consider the random walk (G, /i), assumed to be transient. 
Define the set of infinite normal form words C E^ by 

I/^ = {u = U{)Ui • • • u/c • • • 6 E^ I Vz G N, Ui^i G Next(zxj)} . (8) 

A word belongs to iff all its finite prefixes belong to L{G,T). Consider the 
natural action E x L^, (a, 0 ^ ' C? with a - ^ = a^o^i * * * if a € Prev(^o)? 

a • ^ • • • if a * ^0 ^ 5], and a • ^ = ^ 1^2 * • * if a = Equip E^ 

with the Borel a-algebra associated with the product topology. This induces a 
<j-algebra on Given a measure on and a G E, define the measure 
ai/^ by: ff(^)d(aiy^)(^) = f A probability measure on is 

invariant if 

= (9) 

cl^Yj 



Proposition 4.1. Let {Xn)n be a realization of the random walk (G, //) and set 
Yji = p>{Xn)^ There exists a r.v. valued in such that a.s. 

lim Tn = , 



meaning that the length of the common prefix between Yn and Y^ goes to infinity 
a.s. Let he the distribution ofY^. The measure is invariant and is the 
only invariant probability on L^. We call it the harmonic measure of {G,fi). The 
drift and the entropy of the random walk are given by: 



y^Next(x) 



E [ 

xeY 



( 10 ) 



( 11 ) 



xeY 



where dx ^p^/dp,^ is the Radon- Nikodym derivative of x with respect to 

p^. 



In the context of the free group, this is proved for instance in [16, Theorem 
1.12, Theorem 4.10, Corollary 4.5]. The proofs adapt easily to the present setting. 
Several of the key arguments go back to [8], see [16] for precise references. 

Intuitively, the harmonic measure p^ gives the direction in which (Xn)n goes 
to infinity. 



5. Markovian harmonic measure 



Define !B = {x G | Vz,Xi > 0, Xi = 1}. Consider r e i. Define the matrix 
P of dimension E x E by 



^(^’)(Ex€Next(«)’’(^)) ^ ifveNext(w) _ 
0 otherwise 



( 12 ) 




Random walks on groups 



451 



It is the transition matrix of a Markov Chain on the state space E, which can be 
proved to be irreducible. For convenience, set for all a G E, s(a) = SxeNext(a) ^(^)* 
Let {Un)n a realization of the Markov chain with transition matrix P and 
starting from Ui such that P{Ui = x} = r{x). Set IT^ = lim^i • • • Un, and let 
be the distribution of U^. Clearly the support of is included in L^. For 
1^1 • • • tifc E I/(G, E), we have 

= r{Ui)Pu,,u2---Pn^-uUk ( 13 ) 

= . /(^ 2 ) r{uk) ^ r{m)r{u 2 ) r{uk-i) 

^ s{uk-i) s{m)s{u2) ^ 

We call the Markovian multiplicative probability measure associated with r. 

Observe that the measure is in general non-stationary with respect to the 
translation shift r : E^ E^, {xn)n (^n+i)n- Indeed, the distribution of the 
first marginal is r which is in general different from the stationary distribution of 
P. 



5.1. The main theorem 

The Traffic Equations associated with are defined by: Va G E, 

x(o) = /i(a) ^ a:(u)+ fx{u)x{v)+ — p:a^(a)- 

ueNext(a) u*v=a txePrev(a) Z^v€Next(w) 

(14) 

We are now ready to state the main result. 

Theorem 5.1. Let (G, E) be 0-automatic. Let ji be a probability measure on E such 
that UnSupp(/x*’^) = G and such that the random walk (G, /i) is transient. Then 
the Traffic Equations (14) have a unique solution x G !B. The harmonic measure 
of the random walk is the Markovian multiplicative measure associated with x. 



The harmonic measure is not stationary in general. However, we have the 
following result: 

Proposition 5.2. Let H be a finite group and let {Gi)i^i be a finite family of copies 
of H. Let TTj be the isomorphism between G{ and H. Let u be a probability measure 
on H\{1 h}- Consider the free product G = i^i^iGi and let fi be the probability 
measure on T, = UiGi\{lGrJ defined by: \/g G Gi\{lGi}, ffig) = jy o 7Ti{g)/\I\. 
Then the harmonic measure o/ (G, /i) is stationary and ergodic. 

Consider now a random walk (G, g) where G = Gi ★ G 2 is the free product 
of two arbitrary finite groups. Then the harmonic measure is 2-stationary, 
meaning that: Mu G L{G,T),Mk G N, /x°°(uE^) = ^^(E^^uE^). 

Starting from (lO)-(ll) and using Theorem 5.1, we obtain a simple formula 
for the drift: 

1 Ka)[-r{a~^) + Y, ^(^)] ' (^^) 

6GNext(a) 

and for the entropy: 

/i= -^M(a)[log[-^^rYT]r(a"^)+ Y log[%Jp]K^)+logk(a)]r(Next(a)) 



where r(Next(a)) = E6€Next(a) ^(^) and q{a) = r(a)/r(Next(a)). 

In particular, if the probabilities ffia) are algebraic numbers, then the drift 
and the entropy are algebraic numbers. 




452 



Jean Mairesse and Frederic Matheus 



5.2. Harmonic functions 

For all u e G, define q{u) = P{3n \ Xn = u}^ the probability of ever reaching 
u. If (f{u) = ui- -Uk € S), by the strong Markov property, we have q{u) = 

q{ui)q{u 2 ) • • • q{uk)^ Therefore, all we need to compute are the quantities g(a), a G 
E. These quantities satisfy the equations: Va G E, 

q(a) = n{a)+ n{u)q{v) + q{a) n{c)q{c~^) . (16) 

u*v=a cGS\Ea 



Proposition 5.3. The Equations (16) characterize (^(n))a€S- Let r be the unique 
solution to the Traffic Equations in 3. We have: Va G E, q{a)=r{a) /[ 

bGNext(a) 

The harmonic measure satisfies: Vui • • • G I/(G, E), • • • “UfeE^) = q{ui) • • • 

q{uk-i)r{uk). 

Consider now a free product of finite groups Gi^r- • •★Gfc. Set E^ = Gi\{lGi}, 
and q{T,i) = have: Va G E^, r(a) = q{a)/[l3~q{Tii)]. At last, for a 

finitely generated free group F(5'), we have: Va G 5U r{a) = q{a)/{l + q{a)). 



For a general 0-automatic pair, there is no simple formula giving r as a 
function of q. We come back to this point at the end of §5.2. 

Specializing Proposition 5.3 to the free product of two groups, we obtain an 
unexpected identity: 

9(^1)9(112) = 1 . 

In words, the average number of visited elements in Ei is the inverse of the average 
number of visited elements in E2. It is the identity used to prove the last part of 
Proposition 5.2. 



The knowledge of g(.) enables to determine explicitly the minimal harmonic 
functions. A positive harmonic function is a function f : G ^ such that 
Va G G, = /('^)- ^ positive harmonic function / is minimal 

if /(1g) = I aiid if for any positive harmonic function g such that f > g, there 
exists c G R-h such that f = eg. 

For g G G, set F(^) = Green function), and define the map 

: G R-h by 



Kg{x) 



r(x ^ *g) 



^ * g) 

< 1 ( 9 ) 



The right-hand equality is obtained by observing that T{v) = q{v)T{lG) for all v. 



For ^ G I/^, define : G ^ R-h by — lim^ Ar7r(^[n])? where ^[n] is the 
length n prefix of Set ^ = ^0^1 * * * • For x G G with (p{x) = xq • • • Xn-i G L{G, E), 
set k = \(f{x) A the length of the longest joint prefix of ip{x) and We have 

K (t) = I ■ • -9(^n-i)]/[9(^o) • ••9(^fc-i)j if G Prev(a) 

^ l[9(a;fe^ *6)9(a;fc+i)-- •9(a:“Ai)] / [9(^0) •••9(6)] otherwise 



Proposition 5.4. The minimal positive harmonic functions are the functions 

^gl^. 



The set ^ G L^} forms the Martin boundary of the random walk. 




Random walks on groups 



453 





Figure 6.2 A nearest neighbor random walk on Z/2Z^Z/3Z (left), and the simple random 
walk on Z/4 Zt*tZ/ 4Z (right). 



Comparison with the literature. The importance of the Equations (16) in q is 
well-known. In the seminal paper of Dynkin & Malyutov [6], these equations are 
explicitely solved in the free group case, and the harmonic functions are then 
derived as above. In [21], the authors prove that /i^ is Markovian for the free 
group as follows: they use the expression for q obtained in [6], they define r as 
in the second part of Proposition 4.1, and then prove that the measure defined 
by v^{ui • • • UkT,^) = q{ui) • • • q{uk-i)r{uk) is the harmonic measure. For trees of 
finite cone types [19], the proof that the harmonic measure is Markovian is also 
centered around the analog of the Equations (16). The series version of (16) is the 
main ingredient in the Sawyer & Steger approach [21] for computing 7, see §6,1. See 
also [15, 19]. At last, the series version of (16) can be used for finite range random 
walks on free groups to get qualitative results (central limit theorem, asymptotic 
type of P{Xn = 1g}), see Lalley [14]. 

Here the proof that is Markovian is different and based on the Traffic 
Equations (14) instead of the Equations (16). This is a more direct path and the 
only way to proceed in the general case since we cannot retrieve a solution to the 
Traffic Equations from a solution to (16). 



6. Explicit computations 

In Theorem 5.1, the harmonic measure is completely determined via the vector r 
which is itself the solution of an explicit finite set of polynomial equations. In small 
or simple examples, it is possible to go further, that is, to solve these equations to 
get closed form formulas for the harmonic measure, the drift, the entropy, or the 
harmonic functions. It is the program that we now carry out for several specific 
and interesting cases. In §6.2-§6.6, we compute explicitly r, and we illustrate by 
providing closed form formulas for the drift. The computations have been carried 
out using Maple and Mathematica. 

We first discuss alternative existing methods for computing the drift. (They 
do not work for computing the entropy for instance.) 



454 



Jean Mairesse and Frederic Matheus 



6.1. Comparison with other methods for computing the drift 

Let G = Gi ★ G 2 be a free product of two finite groups. Set = Gi\{lGi} and 
E = El U E 2 . Let /i be a probability measure on E such that: Vi, Vx G E^, /i(x) = 
In words, is uniform on each of the two groups. Consider the random 
walk {Gjfi). Here, computing the drift becomes elementary and does not require 
knowing that the harmonic measure is Markovian. 

Set p = /^(Ei),fci = #Si, and ^2 = #^ 2 - Denote by i G {1,2}, the set of 
elements of G whose normal form representative ends with a letter in E^. When 
we are far from the unit element 1 g, the random walk on G induces a Markov 
chain on {1,2} with transition matrix: 



P = 



p(ki - l)/ki p/ki + 1 - p 

{l-p)/k2■^rp (1 -p)(fc2 - l)/k2 



Let 7T be the stationary distribution, that is ttP = '7r,7r(l) + 7 t( 2) = 1. By the 
Ergodic Theorem for Markov Chains, we have lim^[ P{X„ G 1} , P{Xn G 2} ] = tt. 
The value of the drift follows readily: 



1 n 1 



z=0 



2p(l -p)(fcifc2 - 1) 

(1 -p)ki +pk2 + kik2 ' 



Now assume that G = Gi ★ • • • ★ G/c, where the Gi are finite groups, and assume 
that Vi,Vx G E^ — Gi\{lGj, p{x) = /i(Ei)/#Ei. Then each of the finite groups 
can be collapsed into a single node, and the random walk (G,/i) projects into a 
nearest neighbor randow walk on a tree with k cone types. Therefore, the formulas 
for the drift given in [19] apply. 

With the exception of (22), none of the formulas obtained in §6.2-§6.6 corre- 
sponds to the above two situations. 

Now let us discuss the Sawyer-Steger method [21]. It was developped for the 
free group but adapts to the present situation. Let (G, E) be a 0- automatic pair. 
For p G G, define the r.v. r{g) = min{n | Xn = g} (with r{g) = 00 if p is not 
reached). Define the first-passage generating series S G R[[y,zj] by: 

S{y,z) = Y,y’^ E = (17) 

keN \g\i:=k nGN 



Let Sy and Sz denote the partial derivatives of S with respect to y and 2 . Adapting 
the results in [21, Theorem 2.2 and Section 6] (see also [19, Section 6]), one obtains 
the following formula for the drift: 



7 = 5j,(1,1)/5.(1,1). (18) 

Now assume, for simplicity, that G = Gi ★ • • • ★ G/e is a free product of fi- 
nite groups. Set E^ = Gi\{lGi} and E = U^E^. For u e G, define the series 
q{u,z) = ~ Observe that q{u,l) = q{u), the probability of 

ever reaching u, defined in §5.2. In particular, if one encapsulates the Equations 
(16) as q{a) = ^a(^), then q{a,z) = z^a{q{^))- Using this last set of Equations, 
the corresponding set of Equations for the derivatives dq{a^z) / dz^ and playing 
around with the Equations (17) and (18), we get: 



Ell ?'/(!+ 9.)^ 



qi 



E«(w>i). 9i=Et 



dq{u, z) 



ueT,i 



ue^i 



dz ■' I 



. lz=l 



(19) 




Random walks on groups 



455 




Figure 6.3 The drift of (Z/2Z^ Z/3Z,/i) as a function oi p = p{b) and q = p(h^) (left), 
and the drift of as a function of p = p{a) — p{b) = 1/2 - p{a~^) = 1/2 — p{b~^) 

(right). 



This formula is more complicated than the one obtained in (15). The reason for this 
is easy to understand. There is much more information in the series S than what 
is relevant for computing 7. In particular, all the ‘transient’ behavior of the walk is 
encoded into it. Our approach centers around the knowledge that is Markovian. 
It allows to compute 7 via a simple ‘equilibrium’ argument: p * p^ = p°^. This is 
a more direct path. Consequently, it gives more chances to solve the equations to 
get a closed form formula. As an exercise, we tried to retrieve the results for 7 in 
§6.2-§6.6 using (19). We succeeded in two cases: formulas (21) and (23). On the 
other hand, the results in §6.4-§6.6 seem totally out of reach. 



6.2. Random walks on Z/2Z^Z/3Z 

The group Z/2Z^Z/3Z is isomorphic to the modular group PSL(2, Z) (i.e. the 
group of 2x2 matrices with integer entries and determinant 1, quotiented by ±Id). 
Let a and b be the respective generators of Z/2Z and Z/3Z. A possible represen- 
tation of the group is 



a = 



0 -1 
1 0 ’ 




( 20 ) 



Quoting [4, Chapter II.B]: “The modular group is one of the most important 
groups in mathematics” . 



Consider a general nearest neighbor random walk (Z/2Z ★ Z/3Z, /x). Set 
p{a) = r,p{b) = p,p{b‘^) = q. The Traffic Equations can be solved explicitly. 




456 



Jean Mairesse and Frederic Matheus 



The resulting value of the drift is: 



2 r (^q - p - g + \/{p^ + g^)(3 + (r + + (r + qY) + 2pq{2r + 1)^ 

(r -h p)^ + (r + g)2 - pg + 2 



( 21 ) 



For instance, the drift is maximized for r = zq^p = 1 — zo^q = 0 {or r = zo,p = 
0,q = 1 — 2:0)), where zq is the root of [z^ + 122:^ — + A7z‘^ — 48z + 12] whose 

numerical value is 0.490275 • • • . The corresponding numerical value of the drift is 
7max = 0.163379 • • • . This was not a priori obvious! 



6.3. Random walks on Z/3Z^Z/3Z 

Consider the free product Z/3Z^Z/3Z with a and b being the respective generators 
of the two cyclic groups. Consider the probability p such that p{a) = p{a^) = 
p^ p(b) = p{b‘^) = q = 1/2 — p. Solving the Traffic Equations yields: 

2 n l + 2p 2x ^-P 

r{a) = r{a)= ^ , r{b) = r{b ) = -y- . 

We have p°^{uiU 2 • • = r(ui)(l/2)^“^, i.e. is the uniform measure ex- 

cept for the distribution of the initial element of the normal form. The drift is 

7 = 4pg = 2p(l - 2p) . (22) 

Assume now that p{a) = p{b) = p, p{a^) = p(^>^) = q = 1/2 — p. We obtain: 

The harmonic measure is the Markovian multiplicative measure associated with 
r. It is stationary, see Proposition 5.2. The drift is 

7 = i V^16p2 - 8p + 5 . (23) 

At last, consider the case p{a) = p,p{a^) = g, and p{b) = p{b‘^) = {1 - p — 
q) /2. Solving explicitly the Traffic Equations is feasible but provides formula which 
are too lengthy to be reproduced here. However, for the drift, several simplifications 
occur, and we obtain the following formula: 

7 = 2(1 - p - g) \/ (p^ + g^ + pg)/(p^ + g^ - 2pq + 3) . 

For the general nearest neighbor random walk on Z/3Z^Z/3Z, we did not 
succeed in solving completely the Traffic Equations. 



6.4. The simple random walk on Z/2Z^Z/fcZ 

The groups Z/2Z^Z/fcZ are known as the Hecke groups. We consider the random 
walk (Z/2Z^Z/A:Z, p) with p{a) = p{b) = p{b~^) = 1/3. 

Consider the applications Gn : [0, 1] R, n G N, defined by 

Go{x) = ^ Gi{x)=x, 'in >2, G„(x) = “ Gn~ 2 {x) ■ 

For instance, G 4 ,{x) = (1552x^ — 4416x^ + 4296a;^ — 1600x -|- 165)/(2x - 3)^. 




Random walks on groups 



457 



Theorem 6.1. For k > 3, the equation Gk-i{x) = x has a unique solution in 
(0,1/2) that we denote by y^. The harmonic measure of is the Markov- 
ian multiplicative measure associated with r: r{a) = Go(yfc), Vz G — 

1}, r{F) = Gi(y/c). The drift is 7^ = (1 — 2yk)/3. It is strictly increasing in k and 
limfc7fc = 2/9. 



Here is a table of the first values of 7, given either in closed form or numeri- 
cally when no closed form could be found. Set Zk = Z/fcZ. 





Z 2 ★ Z 3 


Z 2 'k Z 4 


Z 2 ★ Z 5 


Z 2 k Zg 


Z 2 k Z 7 


Z 2 k Zg 


7 


2/15 


(V7-l)/9 


(2v/^-4)/57 


0.21341... 


0.21792... 


0 . 22010 ... 



6.5. The simple random walk on Z/fcZ^Z/fcZ 

Consider the free product Gi ★ G 2 = Z/fcZ ★ Z/fcZ. Set Ei = Gi\{li} = {a, . . . , 
E 2 = G 2 \{l 2 } = {^>5 • • • and E = Ei U E 2 . Consider the simple 

random walk (Z/A;Z ★ Z/A:Z, /i) with /i(a) = //(5) = = 1/4. See 

Figure 6. 

Consider the applications : [0, 1] M, n G N, defined by 
Fo{x) = l, F,{x) = x, ^n>2, Fn{x) = 2{2~x)Fn-i{x)-Fr^-2{x). (24) 

Theorem 6.2. For k>3, the equation Fk{x) = 1 has a unique solution in (0, 1) that 
we denote by The harmonic measure o/(G, /i) is the Markovian multiplicative 
measure associated with r: Vz G {1, . . . , A: — 1}, r(a^) = r{F) = Fi{xk)/2. The drift 
i'S 7fc = (1 — x\f)j2. It is strictly increasing in k and limk^k = 1/3. 





Z 3 k Z 3 


Z 4 k Z 4 


Z 5 k Z 5 


Ze k Ze 


Z 7 k Z 7 


Zg k Zg 


7 


1/4 


(V5-l)/4 


(Vl3-l)/8 


0.33085... 


0.33251... 


0.33306... 



6.6. Random walks on Artin Groups with 2 generators 

The Artin group with 2 generators {k > 3) is the group with finite presentation 
Ak = {a,b \ prod(a, b; k) = prod(6, a;k)) , (25) 

where prod(a, 6; k) = ababa . . . , with k terms in the product on the right-hand 
side. Observe that As = Bs, the well-known braid group over three stands. 

Set E = {a, a“^, 6, 6“^}. The pair (Ak,T,) is not 0-automatic. In particular 
there is no natural notion of geodesic normal form over the generators E. However, 
we are able to go back to the 0-automatic framework by a series of intermediate 
steps, that we now list. 

Consider the random walk {Ak^fx) where // is a probability on E. Write the 
group elements in Garside normal form [9]. This requires the switch to a new set of 
generators. In particular, the Garside normal form is not geodesic for the generators 
E. Let Z be the center of Ak. Consider the induced random walk on AkjZ^ the 
group quotiented by its center. Assume that /z(a) = fi{b),p{a~^) = /i(6“^). Show 
that the induced random walk behaves like a nearest neighbor random walk on 
Z/fcZ^Z/fcZ. Use the results of §6.3 and §6.5. Go back from AkjZ to Go back 
to the natural generators. 

In the end, what is lost is an explicit description of the harmonic measure of 
(Afc,//), but what remains is an explicit formula for the drift. 






458 



Jean Mairesse and Frederic Matheus 



Theorem 6.3. Consider the random walk (Bs,//) where ii is a probability measure 
on S such that fi(a) = p{b) = p, p{a~^) = p{b~^) = 1/2— p. The drift with respect 
to the natural generators E = {a, 5, a“^, is 

7(p) = max[l-4p,g{l/2-p),g{p),-l+Ap\, g{p) = ^1 + 4 ^ ' 

Consider the simple random walk {Ak^p) with p{a) = p{a~^) = p{b) = p{b~^) = 
1/4. Let be the drift with respect to S. We have 

^ f (1 - Xfc) [ Zill iFi{xk) + {j/2)Fj{xk) ] if k = 2j 
\(1 - xk) [ ELi iFiixk) ] if k = 2j + l’ ^ ’ 

where the Fi were defined in (24)- The drift 7^ is strictly increasing in k, and 
linifc7/c = 1/2. 

Set u = 7/24 + 11/[24(71 -h 6 vTT 7) + (71 + 6v^)^/V24 = 0.155979 • • • . 
The function 7 (t>), represented on Figure 6.2, has several unexpected character- 
istics: it is linear on the intervals [0,i^] and [1/2 — u, 1/2] but not on the interval 
[r^, 1/2 - 1^1; and it has 3 points of non-differentiability: n, 1/4, 1/2 — li. Hence, the 
behavior of the braid model has some kind of phase transitions whose physical 
interpretation is intriguing. 





^3 


A4 


A5 


^6 


A7 


^8 


7 


1/4 


CM 
r— 1 


(Vl3- 1)2/16 


0.462598... 


0.475221... 


0.487636... 



7. From groups to monoids 

A pair (M, E) formed by a monoid and a finite set of generators is 0- automatic if 
the set of locally reduced words L{M^ E) is a cross-section and the analog of (7) 
holds. A plain monoid together with the natural generators, forms a 0-automatic 
pair. Define the Traffic Equations as in (14) with the convention that p{u~^) = 0 
if u has no inverse. Then Theorem 5.1 and Proposition 5.2 hold for monoids. (But 
not Propositions 5.3 and 5.4.) 

To illustrate, consider a free product of the form M = Ma ★ ★ Me ★ 

where Mi is equal either to Z2 or to B. Here B = ( a | = a ) is the Boolean 

monoid and Z2 = Z/2Z. Let i be the generator of Mi. We consider the simple 
random walk on M, that is the random walk defined by p : p{i) = 1/4, Vi For 
Z2 ★ Z2 ^ Z2 ★ Z2 and B^B^^BvtB, it is elementary that 7 equals respectively 1/2 
and 3/4. The other values of the drift are given below. 





Z2 Z2 ★ Z2 "A" B 


Z2 ^2 B "A" B 


Z2 Ar B "A- B ★ B 


7 


(12 + 3V2)/28 


(6 + V3)/12 


7/10 



The value 7/10 can be obtained by elementary arguments without having to 
solve the Traffic Equations. But not the other two values. 

Acknowledgement. The authors would like to thank both W. Woess and an anony- 
mous referee for pointing out the method of [21] for computing the drift, see §6.1; 
as well as several other references. 






Random walks on groups 



459 



References 

[1] A. Avez. Entropie des groupes de type fini, C. R. Acad. Sci. Paris Ser. A-B, 
275:1363-1366, 1972. 

[2] D. Cartwright. Some examples of random walks on free products of discrete groups. 
Ann. Mat. Pura Appl. (4)^ 151:1-15, 1988. 

[3] D. Cartwright and P. Soardi. Random walks on free products, quotients and amal- 
gams. Nagoya Math. J., 102:163-180, 1986. 

[4] P. de la Harpe. Topics in geometric group theory. Chicago Lectures in Mathematics. 
University of Chicago Press, 2000. 

[5] Y. Derriennic. Quelques applications du theoreme ergodique sous-additif. Asterisque^ 
74:183-201, 1980. 

[6] E. Dynkin and M. Malyutov. Random walk on groups with a finite number of gen- 
erators. Sov. Math. Doki, 2:399-402, 1961. 

[7] D. Epstein, J. Cannon, D. Holt, S. Levy, M. Paterson, and W. Thurston. Word 
processing in groups. Jones and Bartlett, Boston, 1992. 

[8] H. Furstenberg. Noncommuting random products. Trans. Amer. Math. Soc., 108: 
377-428, 1963. 

[9] F. Garside. The braid groups and other groups. Quart. J. Math. Oxford, 20:235-254, 
1969. 

[10] M. Gromov. Hyperbolic groups. In Essays in group theory, volume 8 of Math. Sci. 
Res. Inst. Publ., pages 75-263. Springer, 1987. 

[11] Y. Guivarc’h. Sur la loi des grands nombres et le rayon spectral d’une marche 
aleatoire. Asterisque, 74:47-98, 1980. 

[12] R. Haring-Smith. Groups and simple languages. Trans. Amer. Math. Soc., 279(1): 
337-356, 1983. 

[13] V. Kaimanovich. The Poisson formula for groups with hyperbolic properties. Ann. 
of Math. (2), 152(3):659-692, 2000. 

[14] S. Lalley. Finite range random walk on free groups and homogeneous trees. Ann. 
Probab., 21(4):2087-2130, 1993. 

[15] S. Lalley. Random walks on regular languages and algebraic systems of generating 
functions. In Algebraic methods in statistics and probability, volume 287 of Contemp. 
Math., pages 201-230. Amer. Math. Soc., 2001. 

[16] F. Ledrappier. Some asymptotic properties of random walks on free groups. In J. Tay- 
lor, editor. Topics in probability and Lie groups: boundary theory, number 28 in CRM 
Proc. Lect. Notes, pages 117-152. American Mathematical Society, 2001. 

[17] J. Mairesse. Random walks on groups and monoids with a Markovian harmonic 
measure. In preparation, 2004. 

[18] J. Mairesse and F. Matheus. Random walks on free products of cyclic groups and 
on Artin groups with two generators. In preparation, 2004. 

[19] T. Nagnibeda and W. Woess. Random walks on trees with finitely many cone types. 
J. Theoret. Probab., 15(2):383-422, 2002. 

[20] S. Nechaev and R. Voituriez. Random walks on three-strand braids and on related 
hyperbolic groups. J. Phys. A, 36(l):43-66, 2003. 

[21] S. Sawyer and T. Steger. The rate of escape for anisotropic random walks in a tree. 
Probab. Theory Related Fields, 76(2):207-230, 1987. 

[22] J. Stallings. A remark about the description of free products of groups. Proc. Cam- 
bridge Philos. Soc., 62:129-134, 1966. 

[23] C. Takacs. Random walk on periodic trees. Electron. J. Probab., 2:no. 1, 1-16, 1997. 

[24] A. Vershik. Dynamic theory of growth in groups: Entropy, boundaries, examples. 
Russ. Math. Surv., 55(4):667-733, 2000. Translation from Usp. Mat. Nauk 55(4) :59- 
128, 2000. 




460 



Jean Mairesse and Frederic Matheus 



[25] W. Woess. Nearest neighbour random walks on free products of discrete groups. 
Boll Un. Mat Ital B (6), 5(3):961-982, 1986. 

[26] W. Woess. Random walks on infinite graphs and groups. Number 138 in Cambridge 
Tracts in Mathematics. Cambridge University Press, 2000. 

Jean Mairesse 

LIAFA, CNRS-Universite Paris 7, case 7014, 2, place Jussieu, 75251 Paris Cedex 
05, France. Jean.Mairesse@liafa.jussieu.fr 

Frederic Matheus 

LMAM, Universite de Bretagne-Sud, Campus de Tohannic, BP 573, 56017 Vannes, 
Prance. Frederic . Matheus@univ-ubs . f r 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Nested Regenerative Sets and Their Associated 
Fragmentation Process 

Philippe Marchal 

ABSTRACT: We give a fractal construction of nested^ stable regenerative 
sets and study the associated inhomogeneous fragmentation process. 



1. Introduction 

The purpose of this paper is to introduce an explicit construction of nested, stable 
regenerative sets and to study the related fragmentation process. The complete 
construction of regenerative sets on the whole half-line is explained in the next 
section. However, the intersections of these sets with the interval [0, 1) admit the 
following description as random fractal sets. Let us introduce the 

SPLITTING PROCEDURE. Given a G [0,1], this procedure takes an interval 
a, 6), cuts it into two intervals [a, /7), [f7, 6), where U is uniformly distributed on 
a, 6), and returns [a, C7), [/7, b) with probability a and [a, U) with probability 1 — a, 
independently of U. 



Fix a G [0, 1] and use the splitting procedure to construct a 



RANDOM FRACTAL SET. We construct recursively a sequence (5n(o;),n > 0) 
where for each n, Sn{cy) is a collection of intervals of [0, 1). Put So{a) = {[0, 1)}. 
Then to obtain 5n+i(o;) from 5n(o;), apply to each interval of Sn{<^) the splitting 
procedure with parameter a, independently. 



Theorem 1.1. Set 



Let T be a stable subordinator with index a and let R = {Tt , t > 0} be its range. 
Then 

i?n[0, l) = 5oo(a) 

A corollary is the following representation of the generalized arcsine law: 



Corollary 1.2. Let /q = [O?!]- Define by induction a sequence of intervals In = 
[o^nibn] as follows. 

(i) Let Vn be a uniform random variables on In, independent of the past. 

(ii) Set In+i = [cLn,Vn] (resp.In +1 = [Vn,bn]) u)ith probability 1 — a (resp. a), 
independently of the past. 

Then the point /qo — is distributed according to the generalized arcsine 
law with parameter a, that is, the law on [0, 1] with density 

r(a)P(l - a) 



( 1 ) 




462 



Philippe Marchal 



Our main interest is to investigate the following 

COUPLED CONSTRUCTION. We can perform the splitting procedure with pa- 
rameter a e [0, 1] by taking an independent, uniform random variable V on [0, 1] 
and decide to return one or two sets according asU > a or V < a. Proceeding this 
way, we see that we obtain a coupled construction of (Sn(o'), n > 0), and therefore 
of Soo(a), simultaneously for all a € [0, 1], by starting at o = 1. 

It is clear that (Soo(o!),a e [0,1]) is a nested family of regenerative sets, 
i.e, if o < o', 5oo(of) C 5oo(<^0- t>e shown [7] that this family has the 

subordination property, that is, if o < o', 5oo(c^) can be obtained from 5oo(c^0 by 
subordination. See [1] for details on subordination. 

Moreover, for each o, the regenerative set 5oo(ce) cuts the interval [0, 1] into 
disjoint intervals. Consider the coupled construction with o going from 1 to 0. We 
obtain this way a process where the intervals delimited by Soo{<^) merge together. 
The subordination property entails [4] that this process is a Markov process known 
as the Bolthausen-Sznitman coalescent. This process was introduced in [5] for the 
study of spin glasses. See also [3, 6]. 

We shall focus here on the time reversal, that is, we let a go from 0 to 1, in 
which case we obtain a process where the intervals are broken into smaller ones. 
It turns out that this process has the law of an inhomogeneous fragmentation and 
our construction enables us to study its dislocation measure. 

The whole construction of regenerative sets is explained in the next section. 
We give some results on random partitions in Section 3 and study more specifically 
the fragmentation process in Section 4. 



2. The construction of stable regenerative sets 

We give here the construction of stable, regenerative sets on the positive half- 
line. It is easy to check that if we consider the intersection of these sets with the 
interval [0, 1), we recover the fractal construction described in the introduction. In 
particular. Theorem 1.1 is a direct corollary of Theorem 2.1. 

2.1. Regenerative sets 

For a detailed account on regenerative sets, see for instance [1]. A random subset 
3? of is regenerative if and only if 0 G 3? and for every t > 0, conditionally 
on t G 3i, the subset IR fl [t, oo) has the same law as + t and is independent of 
3? n [0, tj. A classical result tells us that every regenerative set can be viewed as 
the range of a subordinator. 

Moreover, if 3i is self-similar, i.e. if 3i has the same law as c3? for every c > 0, 
then 3i is the range of a stable subordinator. In that case, to identify the index of 
the subordinator, one can use for instance the generalized arcsine law. For every 
time t set gt = sup(3?n [0, t]). Then gi (and more generally, by self-similarity, gt/t) 
is distributed according to the generalized arcsine law with parameter o, i.e. the 
law whose density is given by (1). 

2.2. The construction 

Let 3Sf be a Poisson point process on R_|_ x R+ with intensity d/i = dt ^ x~‘^dx. 
Call C the (random) support of the measure N. To each point M e C associate an 
independent, uniform random variable U{M) on [0, 1]. Note that almost surely, the 
coordinates of the points of C are all different, there are infinitely many points in 




Nested Regenerative Sets 



463 



Figure 2.1 Percolating points of the Poisson point process. 




each strip [s, t] x R+ (s < t) and only finitely many points in each strip [s, t] x [y, oo) 
(y > 0). We shall implicitly assume all these properties in the sequel (that is, we 
shall often omit to write “almost surely”). 

For s < t e M+ let M{s,t) be the highest point of C between times s and t. 
Formally, M(s, t) is the point of coordinates (n, y) such that M{s,t) e C, s < u < t 
and for each point M' = (u', y') G C with s < u' <t, one has y' < y. 

One can then define Mi(s, t) := (si, yi) = M(s, t) and by induction, Mn+i(s, 
t) := (sn+i, yn+i) = The sequence is finite if and only if there is a point 

of C with x-coordinate t. We say that t G M+ percolates at parameter a G [0, 1] if 
for each n, U{Mn{0,t)) < a. By convention, we say that 0 percolates. 

See the figure. The black points are those which percolate, the white ones are 
those for which U > a and which, consequently, do not percolate, and the crossed 
ones are those for which U < a but which do not percolate. The set of percolating 
points is the closure of the set of ^-coordinates of the black points. 

Loosely speaking, if U (M) > a, M does not percolate. In turn, when U (M) < 
a, M “looks to the left”. If the first vertical line seen by M is either the y-axis 
or a vertical line of a point which percolates, then M percolates. Otherwise, if 
this first vertical line seen by M is the line of a point which does not percolate, 
then M does not percolate. The procedure to determine whether M percolates is 
well-defined since there is only a finite number of points at a higher altitude than 
M and on the left of M. 

2.3. Regenerativity and stability 

Theorem 2.1. The set of points percolating at parameter a has the law of the 
range of a stable subordinator with index a. 

Proof: 

(i) First we prove that the set of points percolating at parameter a is re- 
generative. Indeed, assume that s percolates at parameter a. Remark that for 
every t, the x-coordinates of the points Mn{t) converge to t almost surely as n 
increases. Hence for every t > s^ there exists a minimal k = k{s,t) such that 
for n > k, Mn(0,t) > s. Then for n < k, Mn(0,t) = Mn{0,s) and for n > k, 

Mn{0,t) = Mn-k{s,t). 




464 



Philippe Marchal 



Since s percolates, one has U{Mn{t)) < a for every n < k. Hence condi- 
tionally on the event that s percolates, t percolates if and only if for each n> k, 
U {Mn-k{s, t)) < a. In other words, the set of times t>s percolating at parameter 
a is the set of t such that U{Mn{s^t)) < a for every n. This is independent of 
n [0, s] because of the independence properties of the Poisson point process N 
and has the same law as 3?^ + s because of the translation-invariance of 3^. 

(ii) The self similarity is quite obvious. Indeed, write 3Sf = 

set 3Sf = Yhk^ctk^cxk fc>r some c > 0. Since ijl{B) = /j>{cB) for every Borel set 

B G R+ X 'N has the same law as 3Sf. Therefore Oia has the same law as c^a 
for every c > 0 and consequently, 01^ is the range of a stable subordinator. 

(iii) It remains to prove that the parameter of the subordinator is a. Let 
Qi = s\xp{t <1, t G i?a}- Define by induction the sequence An := (un,Zn) by 
Aq = Mn{0^ 1) and = M(0,'Un)- Then by construction, gi G [uk-\-i,uk] if K 
is defined as the smallest integer such that U{Ak-^i) < Of. Remark that Un is the 
product of n independent, uniform random variables on [0, 1] and therefore has 
density (— logt)'^~^/{n — 1)!. Hence the distribution of uk is given by 

P(ux € dt) = ^ P(RT = n,Un ^ dt) 

n>\ 

n>l 

= a{l - a) exp[(a - 1) logt]dt = a(l a)t^~^dt 

and consequently P{uk < s) ^ ce^ as e goes to 0. Similar calculations show 
that P{uk^i < ~ as e goes to 0. Hence P[gi < e) ^ Comparing this 

with the generalized arcsine law, we conclude that the index of the subordinator 
is a. □ 

3, Partitions 

3.1. Partitions of integers 

For more details on random partitions, as well as numerous references, we refer to 
Pitman’s Cours de St-Flour [9]. We consider exchangeable random partitions of 
the set N, i.e. random partitions whose law is invariant under any permutation of 
a finite number of integers. 

An interesting construction of random partitions is the so-called “Chinese 
restaurant” process [9]. The model is parameterized by two reals a, 9 and an initial 
condition (ai, . . . , where fco and ai, . . . , ak^ are positive integers. We must have 
0 < a < 1 and 6 > -koa. 

Imagine a restaurant with infinitely many tables, labeled by the integers. 
There are initially ai customers at the first table, U 2 customers at the second 
table and so on. Then new customers arrive and sit at some table according to the 
following rule. 

Suppose that at a given moment, n customers are seated and occupy k tables, 
the number of customers at each table being rii . . .rik respectively with ni + . . . + 
Uk = n. Then the (n + l)-th customer sits at table number i, 1 < i < k, with 
probability {ui — a)/ {n 9), and at table number A; + 1 with probability {ka + 
O)/(n + 0). 




Nested Regenerative Sets 



465 



Associate with this process a partition of N by saying that i and j are in 
the same block of the partition if and only if the z-th and j-th customers sit at 
the same table. Then one can check that this random partition is exchangeable, 
provided that the first a \ customers sit in an exchangeable manner. We 
shall call an (o, 0)-partition a partition derived from a Chinese restaurant with 
parameters (o;,0) and initial condition (1), that is, one has initially one customer 
at the first table and no other customers. 

3.2. The partition associated with the construction 

We want to show that the Chinese restaurant can be embedded in the construction 
of Section 2. Prom now on we restrict ourselves to the the points of C lying in the 
strip [0, 1] X R_^. Reorder these points by decreasing ordinate, denoting 

-Pl ~ (^1? 2/l)? • • • Pn ” (tri5 Vn) • • • 

with yi> V 2 > — Add by convention Pq = 

Say that Pm is on the left of Pn if tm <tn- 

SEATING RULE. We interpret Pn as the {n + l)-th customer. 

• The first customer sits at the first table. 

• The (n + l)-th customer Pn sits at a new table if tn percolates. 

• Otherwise, Pn sits at the same table as the left most customer on the right 
of Fn- 

This seating rule induces a partition and we have 

Theorem 3.1. The seating rule described above yields the Chinese restaurant with 
parameters (o;,0) and initial condition ko = 1, ai = 1. 

3.3. Proof of Theorem 3.1 

For every n > 2, denote by an the permutation of {1, 2 ... n} such that 

^(Tn{l) ^ ^ ^(Tn{n) 

Set sf — tar,{i) for 1 < i < n. First we have: 

Lemma 3.2. For every n and every permutation a o/ {1, . . . n}, conditionally on 
a — Un, the family of reals (s^ , . . . has the law U of the increasing reordering 
of n independent, uniform random variables on [0,1]. 

Proof By elementary results on Poisson point processes, the t^’s are independent, 
uniform random variables on [0, 1]. Hence (sj , . . . has the desired law U. 

Let (j be a permutation of {l,...n} and be the law of the increas- 

ing reordering conditionally on cTn = a. Remark that for every permutation r 
of {l,...n}, {ti,...tn) has the same law as (tr(i)? • • • ^r(n))- This entails that 
PfT = ^ GOT for every r. Thus does not depend on a and equals L. □ 

Let us introduce some more notation. The restaurant at parameter a induced 
by our construction is formally a map /" : N N depending on the Poisson point 
process N and on the variables (C/(Fn),n > 1), where f^{n) — k means that 
the n-th customer sits at the fc-th table. Denote by f^ the restriction of /“ to 
{1, . . . n + 1}. Set U = {U{Pn),n > 1) and Un — {U{Pm), 1 < m < n). 

Remark that for every n, whether or not Pn percolates is determined by 
(^^njUn), since it only depends on the points of C on the left of Pn and at a 
higher altitude than Pn. Hence for every n, f^ is determined by (an^Un). We 
shall write /“ = Hn{an,Un). As Un and an are independent, the lemma entails 




466 



Philippe Marchal 



that conditionally on /", the variables are the increasing 

reordering of n — 1 independent, uniform random variables on [0, 1]. In particular, 
conditionally on /^, 

P(i„e(s”,sr+i)) = i/n (2) 

for every i < n — 1. 

Observe that according to the seating rule, the set of customers sitting at 
a given table at a given moment is a set of consecutive points if we reorder the 
customers by their x-coordinates. Moreover, it is easily seen by induction that if 
^?n+i 5 • • • x-coordinates of the customers sitting at the z-th 

table at time n, then s^, percolates, does not percolate for m < z < m' and 
percolates. 

As a consequence, the event that the (n + l)-th customer sits at the z-th table 
is the union of these two events: 

• <tn< Sl^,) 

According to (2), the probability of the first event is {m! — m)/n and the prob- 
ability of the second event is (1 — o)/n. Hence conditionally on /^, the probability 
that the (n + l)-th customer sits at the z-th table is 

m' — — a ni - a 

n n 

where rii = m' - m 1 is the number of customers sitting at table z at time n. 
This is the probability defining the Chinese restaurant. This being true for every 
table, we have proved Theorem 3.1. □ 



4. Fragmentation 

In this section, we shall have to consider cadlag functions and we denote as usually 

/(«-)=. lim f{0) 

/3-^a,p<a 

4.1. Inhomogeneous fragmentations 

We refer to [2] for more details on fragmentation. Let be the set of decreasing 
sequences reals xi > X 2 > ..., ^ We call a partition of mass a random 

element of 5^. If a partition of integers is exchangeable, it is a classical consequence 
of De Finetti’s theorem that each block B of the partition almost surely has an 
asymptotic frequency, which is the limit of|Bn{l,2,...n}|/n, This way one can 
derive a partition of mass by reordering the asymptotic frequencies of the blocks in 
decreasing order. In particular, given a Chinese restaurant with parameters (a, 9) 
and initial condition (ai, . . . , a^o), we denote the law of the associated partition of 
mass by 

M(a,0;ai,...,afcj. 

Remark that translating Theorem 3 into a result for partitions of mass (rather 
that partitions of the integers), we recover the Poisson-Dirichlet representation of 
a partition derived from a stable subordinator, as established in [8] . 

A general fragmentation process (At,0 < ^ < 1) is a cadlag Markov process 
such that: 

(a) For every t, Xt is an exchangeable partition of integers. For an integer z, we 
denote by Bt{i) the block containing z in the partition Xt. 




Nested Regenerative Sets 



467 



(b) If 5 < Xt is a finer partition than Xg. 

(c) If then conditionally on {Bs{i),s > t) and 

{Bs{j)^s > t) are independent. 

The set Dis{i) of dislocation times for i is the set of times t such that Bt{i) 7 ^ 
Bt-{i). If t G Dis{i), consider the set of all blocks of Xt contained in Bt-{i) and 
let > a 2 {t,i) ... be the sequence of their asymptotic frequencies. Let bt{i) 

be the asymptotic frequency of Bt-{i), set bn{t,i) = an{t^i)/b{t) and 

We say that a fragmentation process {Xt,0 < t < 1 ) is an inhomogeneous 
fragmentation with dislocation measure (/xt, 0 < t < 1 ) if for each is a sigma- 
finite measure on and if for each integer i 

{{t,b{t,i)), t G Dis{i)) 

is a Poisson point process on [0, 1] x 5^*^ with intensity measure v given by ^{dtds) = 
dtiit[ds). We can now state: 

Theorem 4.1. The process (ipQ,0 < 1 < a), is an inhomogeneous fragmentation. 
Its dislocation measure is given by 

^ 00 

z ^ ^ 1 ) 

1 — a 

n—l 

where Cn is given by 

S ~ 1 “ *) 

Remark As shown in Section 3, the partition obtained at time a is an (a,0)- 
partition. Moreover it is known [9] that if a' > a, one can obtain an (o',0)- 
partition from an (a, 0 )-partition by splitting each block of the (a, 0 )-partition 
according an independent (a', —a) partition. This suggests that, loosely speaking, 
one could obtain 7^+ from 7 a- by an (o:, — a)-partition. The expression of the 
dislocation measure in Theorem 4.1 provides a rigorous version of this idea, whereeis 
an (a, — a)-partition does not make sense strictly speaking. 

4.2. Proof of Theorem 4.1 

It is easy to verify that (a), (b) and (c) hold for the process (?«? 0 < 1 < o;), using 
the independence properties of the construction of Section 2 . We want to show the 
Markov property and the Poisson point process representation for the dislocations 
of a given block. We shall study the evolution of the block containing 1 in the 
partition (Pq,, denoted by B{a) 

Recall that the integers in B{a) correspond to points of the Poisson point 
process Isf and to the additional point Pq = (l,oo) (see Section 2 . 2 ). We shall 
identify B{a) with these points. Rank these points by decreasing ^-coordinate, 
denoting them by 

Qo{a) = {xo{a),zo{a)), Qi{a) = (a;i(a), 01 (a)), . . . 

with 00 = Zo{a) > z\{q) > Set H\{a) — 1 and by induction Hk+i{oi) = 

mm{n,x„{a) < XH^(a){a)}- 

Also, there is natural order on B{a—) by ordering the points in decreasing 
x-coordinate. Let be this order. 




468 



Philippe Marchal 



By construction, for every fc, U{QH,,(a)) > it is easily seen that 

U{QHk(a)) is uniformly distributed on (a, 1), independently of the fragmenta- 
tion up to time a. In turn, for all n ^ {Hk{ct),k > 1 }, U{Qn{oi)) is uniformly 
distributed on (0, 1), independently of the fragmentation up to time a. 

Moreover, it is easily seen that for every n, the restriction of to Qi{a), 
...Qn(a) is a random total order, uniformly distributed over all possible total 
orders and independent of the fragmentation up to time a. We shall see that the 
fragmentation process is determined by and the variables and since 

they are independent of the fragmentation up to time a, this entails the Markov 
property. 

So we have, for every k, U(QHf,(a)) > and U{QHk{a-)) ^ There is a 
dislocation at time a if for some fc > 1, U{QHk(a-)) = In that case, we say 
that we have a dislocation of index I{a) = k and almost surely, for every k' ^ fc, 
U{QHj^,(a-)) > If is easy to check the following: 

Proposition 4.2. For every k > 1, let Tk be the set of dislocation times of index 
k. Then the Tk ’s are iid Poisson point processes on [0, 1] with intensity measure 
dx/{\ — x). 

Proof: This is just due to the fact that, conditionally on U{P) > a, U{P) is 
uniformly distributed on (o^, 1). □ 

If a: is a dislocation time, let be the partition of B{a—) into blocks of 

the partition V{a). 

Proposition 4.3. Assume that there is a dislocation at time a. Then conditionally 
on I {a) = k and Hk{oi—) = k' , the partition ^'(a) has the same law as the 
partition derived from a Chinese restaurant with parameters {a, —a) and initial 
condition ko = 2, a\ = k' , U 2 = 1. 

Proof The proof uses the same arguments as Theorem 3.1 Suppose that I (a) = k 
and Hk{a—) = k'. Let P = (x, z) be the left-most point percolating at a and such 
that z > If there is no such point, put by convention P = (0, co). 

We view B{a—) as the set of customers of the Chinese restaurant. There are 
initially two tables and A:' + 1 customers. At the first table, Pq and Qi{a -),. . . 
Qk'-i{oi—) are seated. At the second table, Qk'{oi—) is seated. 

We also have fc + 2 reals, namely x, xi, . . . x^/, 1, yielding A;' + 1 intervals, 
which we denote by Ji(a), . . . Jk'-\-i{a) from left to right. 

Let n be the minimal integer such that P^ = {tn^yn) satisfies tn > x and 
Vn < It is important to notice that we are considering Pny as defined in 

Section 3.2, and not Qn- In particular Pn may not belong to B{a—). Set Q = P^. 

By the same arguments as in the proof of Theorem 3, Q lies in one of the A;'+l 
intervals Ji(a), . . . with equal probability. Then one has to distinguish 

between 5 possibilities. 

• X G Ji(a), and U{Q) < a. In that case, Q ^ B{a—). 

• X G Ji{a) and U{Q) > a. In that case, Q sits at the second table. 

• X G J 2 (a) and U{Q) < a. In that case, Q sits at a new table. 

• X G J 2 {ol)^ and U{Q) > a. Then Q sits at the first table. 

• X ^ {Ji{a) U J 2 (o;)). Then Q sits at the first table. 




Nested Regenerative Sets 



469 



Therefore the probabilities to sit at the first table, at the second table or at 
a new table are proportional to k' — a, 1 — a, a respectively. The same scheme 
apply for the next customers, which proves the proposition. □ 

Proposition 4.4. Assume that there is a dislocation at time a. Then 

Y,F{Hk{a-) = k'\I{a) = k)t’^' = t(-log(l - 
k' 

Proof: If ^ is a total order on {1, 2, ... n}, set mi = 1 and by induction, m^+i = 
min{j > rrii^j -< m^}. Now let D{k,k') be set of total orders ^ on {1,2, ...fc'} 
such that rrik = k'. As an elementary application of the theory of combinatorial 
species, 

k'>i 

Let us apply this to the order The event that I (a) = k, Hk{o~) = k' is 
exactly the event that 

(A) U{QHk{a-){a-)) = OL and for all i ^ k, U{QH,{a-){ct-)) > «• 

(B) The order -<a, restricted to Qi(a-), . . . Qk'{a—), lies in € D{k, k'). 

On the other hand, the event that o is a dislocation time and that I (a) = k 
is exactly the event that (A) holds. Since the variables U{Pn) are independent of 
the Poisson point process K, they are independent of a,nd therefore (A) and 
(B) occur independently. As a consequence, the probability that (A) and (B) hold 
given that (A) holds is the probability of (B). Finally, remark that the restriction 
of to Qi{a—)^ . . . Qkf{a-) is uniformly distributed over the k'\ possible total 
orders. This proves the proposition. □ 

Propositions 4.2, 4.3 and 4.4, together with obvious independence properties 
of 3sf, entail that the process of dislocations of the block containing 1 follow the 
law induced by the dislocation measure i/. 

Moreover, it is easily seen that the law of dislocations for the block containing 
i is the same for all integers i. This concludes the proof of Theorem 4.1. 



References 

[1] Bertoin, Jean. Subordinators: examples and applications. Lectures on probability 
theory and statistics (Saint-Flour, 1997), 1-91, Lecture Notes in Math., 1717, 
Springer, Berlin, 1999. 

[2] Bertoin, Jean Homogeneous fragmentation processes. Probab. Theory Related Fields 
121 (2001), no. 3, 301-318. 

[3] Bertoin, Jean; Le Gall, Jean-Prangois. The Bolthausen-Sznitman coalescent and the 
genealogy of continuous-state branching processes. Probab. Theory Related Fields 
117 (2000). 

[4] Bertoin, Jean; Pitman, Jim. Two coalescent s derived from the ranges of stable 
subordinators. Electron. J. Probab. 5 (2000), no. 7. 

[5] Bolthausen, E.; Sznitman, A.-S. On Ruelle’s probability cascades and an abstract 
cavity method. Comm. Math. Phys. 197 (1998), no. 2, 247-276. 

[6] Bovier, Anton and Kurkova, Irina. Derrida’s generalized random energy models 4: 
Continuous state branching and coalescents Preprint. 




470 



Philippe Marchal 



[7] Marchal, P. Regenerative sets, random partitions and the Bolthausen-Sznitman 
coalescent. Preprint 

[8] Perman, Mihael; Pitman, Jim; Yor, Marc. Size-biased sampling of Poisson point 
processes and excursions. Probab. Theory Related Fields 92 (1992), no. 1, 21-39. 

[9] Pitman, Jim. Combinatorial stochastic processes (Saint-Flour, 2002). To appear. 
Lecture Notes in Math., Springer. 

Philippe Marchal 

DMA, Ecole Normale Superieure, 75005 Paris, France. 

P hilippe . Marchal@ens . fr 




Part VII 

Applications 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Real Numbers with Boimded Digit Averages 

Eda Cesaratto and Brigitte Vallee 



ABSTRACT: This paper considers numeration schemes, defined in terms of dynamical 
systems and studies the set of reals which obey some constraints on their digits. In this 
general setting, (almost) all sets have zero Lebesgue measure, even though the nature 
of the constraints and the numeration schemes are very different. Sets of zero measure 
appear in many areas of science, and Hausdorff dimension has shown to be an appropriate 
tool for studying their nature. Classically, the studied constraints involve each digit in an 
independent way. Here, more global conditions are studied, which only provide constraints 
on each digit preGx. The main example of interest deals with reals whose all the digit 
preGx averages in their continued fraction expansion are bounded by M. More generally, 
a weight function is dehned on the digits, and the weighted average of each prefix has to 
be bounded by M. This setting can be translated in terms of random walks where each 
step performed depends on the present digit, and walks under study are constrained to 
be always under a line of slope M. We Grst provide a characterization of the Hausdorff 
dimension sm, in terms of the dominant eigenvalue of the weighted operator transfer 
relative to the dynamical system, in a quite general setting. We then come back to 
our main example; With the previous characterization at hand and use of the Mellin 
transform, we exhibit the behaviour of\sM — 1| when the bound M becomes large. Even 
if this study seems closely related to previous works in Multifractal Analysis, it is in a 
sense complementary, because it uses weights on digits which grow faster and deals with 
different methods. This paper only presents a detailed abstract of this study; it describes 
the main tools and the sketches of proofs whereas a full paper [6] contains all the proofs 
of the statements which are briefly described here. 



1. Description of the framework 

1.1. Numeration schemes 

A numeration process associates to each real number x of the unit interval I a 
sequence of digits (mi (a:), 7712(0:), . . . , mn(x ), . . .) where each rrii belongs to some 
alphabet M C N*, finite or denumerable. We consider here such processes which 
are defined in terms of dynamical systems. For a readable treatment of dynamic 
systems of the interval, see [ 24 ]. 

Definition 1 . [Good Class] A dynamical system of the Good Class is dehned by 
four elements 

(i) An alphabet M included in N*, whose elements are called digits. 

(a) A topological partition of I := [ 0 , 1 ] with disjoint open intervals Im, 
m eM, i.e. [0, 1] == UmeM^mi the length of the interval Im is denoted by Pm- 
{in) A mapping p which is constant and equal to m on each Im- 
{iv) A mapping T -often called the shift- whose restriction to each Im is a 
bijection from Im to I. Let hm be the inverse branch ofT restricted to Im- 
The mappings hm satisfy the following: 




474 



Eda Cesaratto and Brigitte Vallee 



(a) [Contraction.] For each m G M, there exist rjm^ with 0 < r]m < Sm < ^ for 
which rjm < \h'^{x)\ < Sm for x e I. The quantity S := satisfies S <1. 

{b) [Bounded Distortion Property]. There exists a constant r > 0, called the dis- 
tortion constant such that |/i^(x)| < r\h'^{x)\ for all m and for all x E I. 

(c) [Convergence on the left of s = 1]. There exists a < 1 for which the series 
YlmeM Pm convergent. The inhmum ctq of such a is the abscissa of convergence. 

With a system of the Good Class, a representation scheme for real numbers of / is 
built as follows: We relate to x its trajectory T(a::) = (x,T{x),T‘^{x), ...^T^{x), ...) 
which can be encoded by the (infinite) sequence of the digits produced by applying 
the map p on each element T'^{x) of the trajectory, 

(mi(x),m 2 (x),...,mn(x),...), with mi{x) := p{T'"~^{x)). (1) 

This framework provides numeration processes where the distribution n-th digit 
rrin may depend on the whole previous history. 

Each branch (or inverse branch) of the n-th iterate of the shift T is called a branch 
of depth n. It is then associated in a unique way to an n-tuple m = (mi, ..., m^) of 
length n, and is of the form hm := The interval 7m ^m(]0, 1[) 
gathers all the reals x for which the sequence of the first n digits equals m: it is 
called the fundamental interval relative to m. Its depth equals the length |m| of 
prefix m and its Lebesgue measure denoted by pm satisfies pm < The set of 
inverse branches of depth n is exactly and the set of all the inverse branches 
of any depth is Distortion and contraction properties entail the existence of a 
constant L > 0 such that 



1 

- < 
L - 



h'{x) 



h'{y) 



<L, 



for any h G 



( 2 ) 



1.2. Main examples 

Here, as we explain next, we focus on dynamical systems relative to an infinite 
alphabet M. The most classical examples are memoryless sources (of Riemann 
type) and the Continued Fraction expansion. 

Continued fraction expansion. The shift T, also known as the Gauss map, is 



T(x) = - - I for x^O, T(0) = 0. (3) 

X X 

It is relative to the topological partition Im = (l/(m+l), 1/m). The inverse branch 
of depth 1 associated to the digit m is the LFT (linear fractional transformation) 
hm{z) = z). This map induces the numeration scheme related to continued 

fraction expansion. 

Memoryless dynamical systems. A dynamical system of the Good Class is memory- 
less when the branches hm are affine. It is completely defined (up to isomorphism) 
by the length pm = Sm of each interval Im of depth one, which equals the proba- 
bility Pm of emitting m at each step of the process. The fundamental interval can 
be chosen as Im '^=]qm,qm-\-i[ with qi = 0 and qm •= 12k<mPk' A special type of 
memoryless source is studied here as a main example: the Riemann type !R(a) (for 
o > 1) where the associated probabilities are 



p^^^ := . (4) 

C{a) m« ^ ^ 

The afiine approximation of the Gauss map is the memoryless system relative to 

the partition Im = (l/(m -|- 1), 1/m). The length Pm equals l/(m(m -f 1)), and it 
is of the same type as the system 3i(2). 




Hausdorff dimension and bounded digit averages 



475 



1.3. Elementary constraints on numeration processes 

In this setting, it is now classical to study numbers x for which the sequence 
(1) satisfies some particular constraints. The instance of Cantor sets where the 
constraint is the same for each digit rrii and only allows a subset A of possible 
values is well known. In this case, the set of such constrained numbers Eji has 
zero measure, and it is thus of great interest to study its Hausdorff dimension. The 
first study on the subject is relative to numeration in base b and due to Eggleston 
[8]. The problem is now completely solved when the alphabet is finite. The case 
when the alphabet is infinite is quite important since it contains a particular case 
of great interest: the reals whose continued fraction expansion only contains digits 
rrii less than M. These reals are badly approximable by rationals, and intervene 
in many contexts of number theory (see [31]), The case of an infinite alphabet 
(even if the process is memoryless) is a little bit more difficult to deal with, and 
the set A of constraints has to be made precise [27, 28] . In a quite general setting 
(dynamical systems of the Good Class, “open” constraints), the question is solved. 
The main tool is the constrained transfer operator Hs^a defined by 

m^A 

For a dynamical system of the Good Class, for real values of parameter s, and on a 
convenient functional space, the operator Us , a has a unique dominant eigenvalue 
denoted by A^i(s). When the set A is “open”, there exists a (unique) real s = ta 
for which Xa{s) = 1 and the Hausdorff dimension of Ea equals ta • 

The particular case of “constrained” continued fractions was extensively studied; 
the beginners were Jarnik [20], Besicovitch [4] and Good [11]. Then, Cusick, Hirst, 
Bumby, brought important contributions, and finally Hensley [13, 14] completely 
solved the problem. In [32], this result was extended to the case of “periodic” 
constraints. 

Another general question of interest is the asymptotic behaviour of dim(£'yi) when 
the constraint becomes weaker (i.e., A M). Then, the Hausdorff dimension tends 
to 1, and the speed of convergence towards 1 is also an important question. In the 
case of continued fractions, Hensley [14] studies the case when Am •= {1^ 2 , . . , M} 
and exhibits the asymptotic behaviour of tm dim(E'yi^) 

= + when 00 . (5) 

1.4. Costs and weighted prefiLx averages 

We consider here constraints which are more general than previous ones. They 
are defined by conditions which only bound all the weighted prefix averages and 
appear in a natural way in the Multifractal Analysis framework [9, 10]. 

A digit-cost c relative to a dynamical system (/, T) is a strictly positive function 
c : M i-> IR^ which extends to a function c : ^ ]R^ via the additive property 

n 

for (mi,m 2 ,...,m„), c(m) y^c(mi). (6) 

On each trajectory T(a;) encoded by the sequence {mi{xj,m 2 {x), . . . ,m„{x ) . . .) 
defined in (1), the weighted prefix average of length n is defined as 

M„{x) := ^y^^c{m,i{x)), (7) 

and we study here the set Fm of reals for w^ich all the Mn{x) are bounded by M. 




476 



Eda Cesaratto and Brigitte Vallee 



1.5. Triples of large growth 

The strength of these constraints depends on the relation between the cost c and 
the occurrence probability of the digits. Consider the (initial) probability distri- 
bution p : fc t-> of digit mfc, together with the limit distribution p of the n-th 
digit rrin which always exists in the Good Class Setting. We are mostly interested 
in the case of a dynamical system of the Good Class, with an infinite alphabet M, 
where the sequences p^, c(m) satisfy 
(cl) m\-^ Pm is decreasing, 

(c 2 ) m c{m) is increasing to +oo. 

The mixed sequences tt^ := minlp^^; c(m) < c(m) < n} sum- 

marize the balance between the increase of cost c and the decrease of distributions 
p,p, and the conditions 

(c3) limsupTry^ = 1, or limsup^y’^ = 1 

(which are equivalent for systems of the Good Class) informally express that the 
increase of c is faster than the decrease of p. Condition (c3) is equivalent to re- 
quiring that the convergence radii of the series U{z), V(z) equal 1, with 

U{z) := V{z) := 23 _ 

n>l m>l 



Definition 2. Consider a triple (7, T, c) made with a system (7, T) of the Good 
Class. If it does not satisfy Condition (c3), it is said to be of moderate growth 
(S^S-setting). If it satisfies Conditions (cl) (c 2 ) (c3) , it is said to be of large growth 
(91^9-setting). When, furthermore, the abscissa of convergence of the Dirichlet 
series 

-P(^) — S ■= 23 (8) 

relative to a triple (I,T,c) of class 9CS equals 1, the triple (I,T,c) is called a 
boundary triple. 

Here, we focus on the SUS-setting. In the boundary case, the (stationary) average 
p(c) of cost c (which equals P(l)) is infinite, 

n 

m(c) ■ Pk = + 00 - ( 9 ) 

k=l 



Examples of triples {I,T,c) of S-CS-type. The boundary triple !B3?(ce) related to 
the Riemann memoryless sources D?(q:) defined by 



(«) 

m 



1 1 
((a) ’ 



c(m) = 



(10) 



is studied here as a main example. Note that !BD?( 2 ) provides an approximate 
memoryless version of the boundary triple formed with the Gauss map (7, T) 
together with the cost c(m) = m which will be one of the most interesting example 
of our study. 



1.6. Subsets Fm and statement of the main results 

We wish to study the sets Fm defined as follows: 

Definition 3. [Set Fm] Consider a triple (7, T, c) of S<CS type, and denote by 7 (c) > 
0 the minimal value of c. For any M > 7 (c), the set Fm is the set of ordinary reals 
X of I for which all the weighted averages Mn{^) defined in (7) satisfy Mn{x) < M. 




Hausdorff dimension and bounded digit averages 



477 



The set Fm can be also described in terms of random walks: To each real number 
X, one associates the walk formed with points {Pi{x))i>o. One begins with Po(^) '= 
(0,0), and, at time z, one performs a step Pi{x) - Pi-i{x) := (l,c(mi(x)). The set 
Fm is the set of reals x for which the walk {Pi{x))i>o is always under the line of 
slope M. 

For any M g] 7 (c), / z(c)[, the Lebesgue measure of the set Fm equals 0, and we 
wish to study the Hausdorff dimension sm of Fm- We obtain three main results. 

The first Theorem provides a (mathematical) characterization of the Hausdorff 
dimension sm of Fm as a solution of a differential system. 

Theorem 1.1. Consider the set Fm relative to a triple (/, T, c) of^ZQ-type. Denote 
by the weighted operator relative to the triple (7, T, c) defined by 

Hs,u,[/] := ^ exp[w;c(m)] • \h'^\^ ■ fohm, 

m£M 

and by A{s,w) the logarithm of its dominant eigenvalue when acts on C^{I). 
Then, for any 7 (c) < M < ii{c), there exists a unique pair {sm.'^m) € [0, l]x] — 
00 , 0 [ for which the two relations hold: 

d 

( 8 ) : A(s, w) = Mw, — A(s, w) = M. 

If, furthermore the second derivative A^ 2 (sm,'?^m) “is non zero, then the Hausdorff 
dimension of Fm equals sm- The two functions M sm^M wm o,re analytic 
at any point M G] 7 (c),/i(c)[ for which A'^ 2 {sm^'^m) ^ 0. 

We now focus on boundary triples: when M tends to pi(c) = + 00 , the dimen- 
sion Sm tends to 1 and we describe the exact asymptotic behaviour of |5 m — 1 |- 
The following two results prove that, in each case (Riemann memoryless sources 
or Continued Fraction framework), the speed of convergence of sm towards 1 is 
exponential in M, and exhibit the precise convergence rate. 



Theorem 1.2. The Hausdorff dimension sm of the set Fm relative to the boundary 
Riemann triple !B!k(a) satisfies, for M ^ 00 

\sm - 1 | 



with 



' (»- DC, (<.)/.(») + 0(®p[-«*l) 

h{a) = ~ logC(o;), and any 6 < 2{a — 



Theorem 1.3. The Hausdorff dimension sm of the set Fm relative to the triple 
formed with the Euclidean system and the cost c{m) — m satisfies, for M 00 , 



\sm - 1| = * 2 ^[1 -h 0(0 ^)] for any 0 <2. 



7T 



1.7. Relation with Multifractal Analysis 

This work is partially related to Multifractal Analysis introduced by Mandelbrot 
[26]. See also [9, 10]. For a dynamical system {I,T), for any n and any x, the 
interval (x) is the fundamental interval of depth n which contains x. Each fun- 
damental interval has two measures, the Lebesgue measure and another measure 
u which is defined by the cost c. More precisely, for costs c which give rise to a 
series Ylm exp[— c(m)] = 1 , the measure 1 / of a fundamental interval 7^ related to 
the prefix m := (mi, m 2 , . . .m^) is defined by |logz^(7m)| = ^21=1 ofrui). In this 
way, with respect to this measure v, the numeration process is memoryless and 
always produces the digit m with probability exp[— c(m)]. In order to compare the 




478 



Eda Cesaratto and Brigitte Vallee 



two measures, the Lebesgue measure, and the measure u, Multifractal Analysis 
introduces the set of real x for which 



YTi=i c{mi{x)) 




logu{ff'^\x)) 


log |/(’^)(a:)| 




log \F'^\x)\ 



satisfies lim Bn{x) = (11) 

n— >•00 



and studies the Hausdorff Dimension oi the set Gf^. 

For Dynamical systems of the Good Class, the sequence — (l/n)log \I^'^\x)\ tends 
almost everywhere to the entropy /i, so that the asymptotic behaviour of the two 
sequences Mn{x) and hBn{x) defined in (7) and (11) is the same almost every- 
where. However, this is only true “almost everywhere”, and finally, the relation 
between the two Hausdorff dimensions sm and is not so clear, and it is of great 
interest to compare our result on Fm to the following result on Gjs, recently ob- 
tained by Hanus, Mauldin and Urbanski [12] which we translate in our setting. 



Theorem 1.4. [12] Consider the set Gp relative to a triple {I,T,c) of 3^9 -type. 
Suppose furthermore that the cost c satisfies J]^exp[— c(m)] = 1. Denote by Hs,w 
the weighted operator relative to the triple {I^T^c) and by A{s,w) the logarithm of 
its dominant eigenvalue. Then, for any (3 near the value f3o = fi[c)/h, there exists 
a unique pair {t,w) = (tf 3 ,wp) G [0,1] x] — oo,+oo[ for which the two relations 
hold: 

Pi Pi 

(S) : A{t - /3w,w) = 0, ~ l3vu,w) = -(3—A{t - f3w,uj). 

The Hausdorff dimension of Gfs equals t^. The two functions /3 t(^, f3 ^ vo^ are 

analytic when (3 is near /3 q . 

Note that, even if the two results (our Theorem 1.1, and the previous Theorem) 
are of the same spirit and involve the same kind of systems (T) and (S), the result 
on G /3 is obtained in the SMS setting, while ours is obtained in the S-C^S setting. 
This explains why the methods used cannot be similar: they both deal with the 
weighted transfer operator However, the authors in [12] used analyticity of 

{s,w) at (1,0), together with ergodic theorems. These properties are no 

longer true in the S'l^S setting, and we have to introduce other tools, similar to 
those used in Large Deviations results. 

1.8. Main steps of the proof 

The two sequences of subsets 

n 

An{M) := {m € M"; c(m) < Mn}, ®n(M) := p| Ar{M). (12) 
are useful for defining the set Fm, in two different ways, 

Fm = U ^m, or Fm = [J ^m, 

n>lm£An{M) n>lmG'Bn(M) 

as means of fundamental intervals /m- There is a close link between these two 
sequences, due to the next property (proven in the full paper [6]). It is of the 
same spirit as the so-called “Cyclic Lemma” , which is useful in the Random Walk 
Setting. 

Cyclic Lemma. For any m G An{M), there exists a circular permutation r for 
which r(m) belongs to !B„(M). 




Hausdorff dimension and bounded digit averages 



479 



Furthermore, both sequences will be useful: the sequence 'Bn(M), in studies on 
coverings (Section 2), and the sequence An{M), in studies on transfer operators 
(Section 3). Section 4 provides the proof of Theorem 1,1, while Section 5 is devoted 
to proving Theorems 1.2 and 1.3. 



2. Hausdorff dimension of sets constrained by their prefixes 

We first recall some classical facts about coverings and Hausdorff dimension. The 
definition of Hausdorff dimension of a given set a priori involves all its possible 
coverings. Here, we introduce a class of sets (the sets which are well-constrained 
by their prefixes) which contains all the sets Fm relative to triples of large growth. 
We prove in Proposition 1 that, for such sets, the Hausdorff dimension can be 
determined via particular coverings, formed with fundamental intervals of fixed 
depth. For sets Fm, this characterization involves sets !B^(M) of (12), 

2.1. Coverings and Hausdorff dimension 

Let F be a subset of L A covering 3 •= {Ji)eeL of F is a set of open intervals 
Ji for which F C finite if cardC is finite. The diameter 

of a covering is the real p that is the supremum of the lengths | J^|. A covering is 
fundamental [with respect to some dynamical system (/, T)] if its elements are 
fundamental intervals. For each covering 3 of F, the quantity F^(3) := \ J\"^ 

plays a fundamental role in the following. 

A subset F of / has zero measure in dimension a (i.e., Pa{E) = 0) if for any e > 0, 
there exists a covering 3C of F for which ro-(^K) < e. A subset F of 7 has an infinite 
measure in dimension a (i.e., Pa{E) = oo) if for any A > 0, there exists p > 0, 
such that, for any covering 3C of F of diameter at most p, one has Fo-(3C) > A. 
The Hausdorff dimension of F, denoted by dim(F) is the unique number d for 
which Po-(F) = 0 for any a > d and Pcr{E) = +oo for any a < d^ 

dim(F) = inf{a; Pa{E) = 0} = sup{cr; = +oo}. 

2.2. Sets which are well-constrained by their prefixes 

We are interested in studying sets of the same type as Fm , and we will consider in 
this section a class of more general sets which are defined by constraints on their 
prefixes of any length. 

Definition 4.[ WC? sets ] Let {I^T) a dynamical system of the Good Class, and 
M its associated alphabet. A subset E is defined by its prefixes if there exists a 
sequence := (Mn)n>i of non-empty subsets Mn C (the constraints) for 
which 

u 

n>l mEMn 

The sequence is the canonical sequence of E. Moreover, if the sequence of 
constraints satisfies the following four conditions, 

(i) For any n > 1, the set Mn is finite, 

(a) 77 (mi,...,mn) G Mn then (mi, ...,mn-i) G Mn-i, 

(Hi) Mni X Mn 2 C Mni+n 2 for all ni, ri 2 , 

{iv) 7Tn := min{pni^; 3 (mi, m 2 , . . . , mn-i) s. t. (mi, m 2 , . . . , mn) G Mn} 
satisfies limnn^ = 1, 

the sequence M^ is said to be well-conditioned. In this case, the set E is said to 
be well-constrained by its prefixes. For each n, the set 3 n •— G Mn} is 




480 



Eda Cesaratto and Brigitte Vallee 



a covering of E, which is Suite and fundamental: The sequence (3n) is called the 
canonical system of coverings of E. 

The following proposition (proven in [6]) shows that the Hausdorff dimension of a 
set E which is well-constrained by its prefixes can be uniquely characterized via 
its canonical system of coverings (3n)- 

Proposition 2,1. [Characterization of the Hausdorff dimension of a WC? set via 
its canonical system of coverings.] Let E be a subset off, which is well-constrained 
by its prefixes, and (3n)n>i its canonical system of coverings. Then 
dim{E) = inf {a; sup{r(x(3n); n > 1} < oo} . 

Proof. It mainly uses Condition (iv) of Definition 4 which is equivalent to a con- 
dition introduced by J. Peyriere [29] 

Va > 0, ]Ei{r{J) • I J|^; J G 3*, \J\ 0} < 1. 

Here, for any J G 3* the quantity r( J) relates the length of J to the lengths of 
fundamental intervals K C J, when depths d{K),d{J) satisfy d{K) = d{J) + 1, 

r(J) =sup|ii; 

We denote by An{M,a) and Bn{M,a), the associated quantities (defined in 
2.1), related to constraints An{M),'Bn{M) defined in (12), 

An{M,a):^ p^, Bn{M,a) := Y Pm- (13) 

meAniM) m€(Bn(M) 

The following result summarizes the results of this Section : Its proof uses Propo- 
sition 1 when applied to the WCT set Fm, together with the relation between 
An{M, s) and Bn{M, s) due to the Cyclic Lemma, and Property (2). 



FC J,FG3^,d(F) = d(J) + l 



Corollary 2.2. Consider a triple {I,T,c) of S^S-type. For M > 7(c), the sequence 
T>n{M) defined in (12) is well- conditioned and the set Fm is well-constrained by 
its prefixes. The Hausdorff dimension of the set Fm satisfies 

dim (Fm) = inf {a; sup Bn{M,a) < 00} = inf{cr; sup An{M, a) < (X)}. 

n n 



3. The main tool: the weighted transfer operator 

In this Section we introduce our main tool: the weighted transfer operator. In the 
following section, this operator will provide useful informations on the asymptotic 
behaviour of sequences An{M,s), Bn{M,s). Here, we summarize its main well- 
known properties, and, more precisely, its dominant spectral properties. 

3.1. Transfer operators 

Consider a dynamical system (/, T) of the Good Class with a cost c. The weighted 
transfer operator Hjn,s relative to a prefix m G M* is defined as 
Hm,s,it;[/](x) := exp[u;c(m)] • \hm{x)\^ • / o hm{x). 

Due to the additivity of the cost (6) and the multiplicativity of the derivative, the 
operators satisfy a fundamental composition property 

O Hn,s,iu — 

The weighted transfer operator is defined as the sum of all 11m, s,w relative 
to a symbol m G M, and the composition property entails that the n-th iterate 
of satisfy 




Hausdorff dimension and bounded digit averages 



481 



:= ^ for any n > 1. 

For zi; = 0, the operator lim,s,w coincides with the classical transfer operator 

■= ICWr ■ / o h^{x), (14) 

which is closely related to the Lebesgue measure pm of the fundamental interval /m* 
The length satisfies p^ = |/im(0) - for some dm ^]0,1[, 

and, Distortion Property (2) entails that, for any m G 

Then, the sequence 

Dn{M,s):= H™,,[1](0). (15) 

TneAn(M) 

satisfies L~^ • An{M,s) < D-n{M,s) < • ^„(M, s), and provides another char- 

acterization of the Hausdorff dimension of Fm , 

dim {Fm) = inf{s; sup Dn{M, s) < oo}. (16) 

tiEN 

3.2. Functional analysis 

For a complete treatment of transfer operators, see [1]. Here, the functional space 
C^{I) is endowed with the norm ||.||i defined by ||/||i := sup{|/(t)|;t G /} + 
sup{|/'(t)|; t G /}. When {a := := ^w) belongs to the set 

S — {(«^. P); X! exp[/)c(m)] ■ 5^ < oo}, 

mGM 

the norm is bounded and Hs,w ^x}ts on C^(/). For a triple {I,T,c) of 

Sfl'S, one has 

8 D §1 with Si := {(cr, p); d > 0, p < 0} U {(cr, 0); a > (Jq}. 

In addition, at any point {s,w) for which (5i5,3?z/;) is in Int(Si), the operator 
^ C'^(7) is analytic with respect to both variables. Remark that the 
analyticity with respect to w is no longer true on the frontier of Si, when s > <7o 
and It; = 0. This is why the Sl^S-setting is not so easy to handle. 

3.3. Spectral properties 

For (s, w) in Int(Si), the weighted operator Hs,w satisfies a Perron-Frobenius prop- 
erty: there is a unique dominant eigenvalue A(s, w) which is positive and simple and 
the corresponding eigenfunction fg^w is positive on I. The corresponding eigenmea- 
sure for the dual operator H* ^ is denoted by The pair {fs,w, J^s,w) is unique 
via the normalization condition Jj fs,wdi's,w = 1- Note that, in the case when the 
branches are affine, the eigenfunction fg^w equals 1, and the eigenvalue X{s^w) is 
explicit, 

A(s,u;) = ^ exp[w;c(m)] (17) 

ttiEM 

Furthermore, is quasi-compact, and there is a spectral gap which separates 
A(s,zi;) from the remainder of the spectrum. Since depends analytically on 
(s,zz;), perturbation theory applies [22], and entails, for {s,w) sufficiently near 
Int(Si), the existence of dominant spectral objects which depend analytically on 
s and w. Finally, the operator decomposes as Hg^w = ^{s^w)Pg^^ + Here, 

Ps,w defined by P 5 ^^i;[/](x) := fs,w{^) * i^s,w[f] is the projection on the eigenspace, 
the spectral radius pg^^ of Ng^u; is strictly less than |A(5, z/;)|, and Ps,w ^ ^s,w = 




482 



Eda Cesaratto and Brigitte Vallee 



^s,w o Ps,w = 0. For any a strictly larger than the ratio ps^w/\^{s,w)\, and for 
any / G C^(7), / > 0, a quasi-powers property holds, and we have 

• Ps,^[/](x) • [1 + 0 (a”)] (18) 

for X G / and (s^w) near Int( 8 i). 

For w = 0, the weighted operator coincides with the transfer operator, and we 
omit the index w. For [s^w) = (1,0), the operator coincides with the density 
transformer. Then, the dominant spectral objects of Hi,o satisfy the following: 

A(1,0) = 1, /i^o = fi = stationary density, z/i^o = vi = Lebesgue measure. 

3.4. Influence of Condition Sl^S on the derivative of the dominant eigenvalue 

For real pairs {s^w) G §i, denote by A{s^w) the logarithm of the dominant eigen- 
value, i.e., A{s^w) := log \{s^w). Then, since the alphabet M is infinite, and the 
cost c is not constant (that is always the case for a cost c of large growth), the 
map (s, w) — > A(s, w) is strictly decreasing and strictly convex (see the full paper 
[ 6 ]). The quantity A' ( 1 , 0 ) equals the opposite of the entropy h of the system. 

It is important to describe the behaviour of the derivative with respect to u;, when 
^ 0“. For s > ao and it; > 0, the (fundamental) equality 

E c{m) exp[wc(m)] / \h'^{t)\^ ■ fs,w ° hm{t) dUs,u,{t). (19) 

m6M 

proves that, when u; ^ 0 “, 

lim \'^{s,w) = fi{c,s) := V c(m) [ ■ fs o hm{t) dus{t). 

This limit is finite if and only if P{s) defined in ( 8 ) is finite. In all the cases, the 
derivative admits a limit (finite or infinite) when w 0~, This limit is 

the average value /jl{c) of cost c with respect to the stationary density fi , 

m(c-1):= E / h{t)di= 4"*)Pm = Mc)- 

meM meM 

4. Hausdorff dimension and dominant eigenvalues 

This Section is devoted to proving Theorem 1.1 which relates the Hausdorff dimen- 
sion to the root of a differential system that involves the dominant eigenvalue of 
the weighted transfer operator. The proof deals with tools that are often used for 
proving Large Deviation results, since it strongly uses the Quasi-Powers Theorem 
together with a well-known technique called “shifting of the mean” [5]. We use the 
sequence Dn{s) defined in (15) and we begin with Relation (16) 

dim {Fm) = inf{s; sup D^(M, s) < co}. 

n€N 

We wish to relate the sequence Dn{M^ s) to the dominant eigenvalue A(s, w) of the 
weighted operator We first describe some useful properties of the function 
A(s,^^;) := logA(s,u;) which are closely related to its strict convexity. 

Lemma 4.1. Consider a triple ofS^9~type. For M < p{c), denote byVM the inter- 
section of [ 0 , 1] with the largest neighborhood of s — 1 for which p{c^s) > MX{s). 
For any pair {s, M) which satisfies 7 (c) < M < p{c) and s G Vm, the following is 

d 

true: (a) there exists a unique w — rj{s,M) < 0 for which — A(s, ^^;)|^y=^ = M. 




Hausdorff dimension and bounded digit averages 



483 



(b) The function ] — oo, 0 [-^ R that associates to w the quantity \{s^w)e 
attains its minimum at w = 77(5, M). The minimal value is denoted by olm{s), 

olm{s) := min{exp[-Mtt;] • \{s,w)\w €] — 00, 0[}. 

Proof. Since the function : w 1— > log A(s, w) is strictly convex, its first derivative 
AJ^j is strictly increasing. It then admits a limit when w tends to 0 “ or when w 
tends to —00. We prove in [6] that lim-u;_,_cx) A|^j(rt;) = 7(c). On the other hand, 
we recall that lim^_,o- = /i(c, s)/A(5) defines a continuous function of s 

which equals //(c) for 5 = 1 . Then, for any M < fJ>{c), there exists a neighborhood 
of 5 = 1 on which lim^^^o- > M. m 

Remark. When /a{c) = +00, the series P{s) defined in (8) is divergent for any 
s G [ 0 , 1 ], and /x(c, s) = +00 for any s G [ 0 , 1 ]. Then, one can choose Vm •= 
[ 0 , 1 ]. In the general case, denote by om the largest s G [ 0 , 1 ] for which //(c, s) < 
MA(s). One can choose Vm =]dM, !]• Then 77(5, M) tends to 0 when s — > um, and 
lim^_^aM (^ m { s ) = A(aM,0) > 1. 

The following result relates the sequence Dn{M^ s) and aM{s). 

Lemma 4.2. Consider any pair (s, M) which satisfies the following three condi- 
tions :j(c) < M < /i(c), s G Vm CL'^id A^^2{s,rj{s, M)) ^ 0 . Then, the sequence 
[Dn{M,s)]i^ admits a limit when n 00 and this limit equals aM{s). 



Proof. We relate the sequence Dn{M, s) to the 72-th iterate of the weighted transfer 
operator exp[— TnM] • First, note that, for (s,w) in Int(§i), 



Dn(M,s) ^ H„,,[ 1 ]( 0 ) < • H^,,[ 1 ]( 0 ). 

meAn(M) 

and Property ( 18 ) ensures the existence of a function a{s,w) such that 

Dr,{M,s)<a{s,w) (^^)". ( 20 ) 

This is true in particular when w equals the value rj{s, M) of Lemma 5 . 1 , and 



lim sup < aM{s). (21) 

n—^oo 

We now prove the converse inequality, with classical technics, that are well-known 
as the shifting of the mean [ 5 ]. Remark that D^{M, s) satisfies 

Dr,{M,s)= Y. TPn\r) with pW(r):= H^.,[ 1 ]( 0 ), ( 22 ) 

r<Mn 

c(m) = r 

For 77 77(5, M) of Lemma 5 . 1 , consider the random variables defined by 






r := 



H?,,[ 1 ]( 0 ) 






Their generating functions E Zn'^^)] can be expressed in terms of the n-th 

iterate of the weighted transfer operator Jis^r] and Jis^w-hri^ 



(23) 

Since 77 is strictly negative, there exists a (complex) neighborhood of u; = 0 for 
which 5i(7i; + 77) < 770 < 0 . Then, the spectral decomposition holds and entails a 
quasi-power expression for the moment generating function. 




484 



Eda Cesaratto and Brigitte Vallee 



E[exp(wZ^^’"))] = a{s,w,r]) ■ [1 + 0(a")] (24) 

where a(5, w, rj) is bounded and a is related to the spectral gap of operators 
and Hs^u;+ 7 /- If V and w -\-rj belong to compact sets included in < 0, we can 
choose |a| < ao uniformly in w. Moreover, the function U[s] defined by U[s^{w) := 
A[ 5 ](rt; + 7/) — A[s](t 7 ) is analytic around w = 0 because A[s](t(;) is analytic around 
ry < 0. At u; = 0, the derivative ?7['s](0) equals A|^j(r 7 ) = M. Furthermore, the 
second derivative is non zero by hypothesis. 



We then apply the Quasi-Powers Theorem [17, 18, 19] which proves that the vari- 
ables follow an asymptotic Gaussian law:^The probability 



:= 



G [Mn- v^,Mn] 









Mn—y/n<r<Mn 



can be approximated by the corresponding probability of the^ Gaussian distribution 

n„ = 6(s) + o(4=), 

yn 



^V/J.JLVLXXX^ ^X V/k7CXIL/XXXUJ' V./X UXXV/ VJI C 
1 

with b(s) := / 



2 dt. 






For large enough n, one has > 6(s)/2, so that, with (22) and (18), 



> n„ -H^,^[1](0) > ^H”J1](0) > d(s,r?) A(s,r?)", 
for some bounded function d{s,r]). Therefore, 

D„{M,s) > ^e^'^d{s,r])- (e”’'^A(s,? 7 ))” , (25) 

and finally 

lim inf [D^(M, > aM{s)- 

n—^oo 

With (21), this ends the proof of the Lemma. ■ 

The next Lemma (proven in [6]) describes the main properties of function aM{s) 
and ends the proof of Theorem 1.1. 

Lemma 4.3. Let M g]7(c), //( c)[ and s G Vm* The function s i-^ (^m{s) is well- 
defined. On the neighborhood V M ) 'Is a continuous function, strictly decreasing, 

and there exists a unique value s = sm ^ '^m such that aM^s) = 1. 



5. Asymptotic behaviour of sm when M +oo 

In this Section, we consider boundary triples {I,T,c). In this case, the Dirichlet 
series P{s) defined in (8) has an abscissa of convergence s = 1, is divergent at 
s = 1 and the average fj,{c) := lim^t,_,o- A'^{l,w) is infinite, so that the second 
derivative A'^ 2 {^,'w) tends to +oo when it; 0“. It is then strictly positive when 
{s,w) is a point of the interior of Si sufficiently near (1,0), and Theorem 1.1 can 
be applied without any restriction. 

We wish to describe the asymptotic behaviour of dim(FM) when M goes to 
-hoc. Since, in this case, the point (sm^'^m) tends to (1,0), the second deriv- 
ative A'^ 2 {smj'^m) is non zero, and Theorem 1.1 can be applied. We focus on 
two particular cases: the Boundary Riemann system (Theorem 1.2 in Section 5.2) 
and the Boundary triple relative to Euclid dynamical system with cost c{m) = m 
(Theorem 1.3, in Section 5.3). 




Hausdorff dimension and bounded digit averages 



485 



We first deal with memoryless schemes, where the dominant eigenvalue admits 
an explicit expression. The Mellin transform is a very useful tool in this context 
(Lemma 5.2). Then, Lemma 5.4 shows that the memoryless source which “approx- 
imates” the continued fraction process provides relevant informations about the 
dominant eigenvalue of the continued fraction scheme. 



5.1. General facts about sm 

Since M tends to +oo, we let 2: := exp(— M), together with s{z) := sm,w{z) := 
—wm^ and we consider the case when 2: 0“^. We denote by \{s^w) := A(s, —w). 

Then, the pair {sm.'^m) is a solution of system (§) if and only if {s{z),w{z)) is a 
solution of system 

(S) : \{s^w) = z'^, —\{s,w)=\ogZ'z'^. 

aw 

When 2: varies in ] exp[-7(c)], 0[, these systems define a parameterized curved 
denoted by 6 which is the set of points (s{z),w{z)). The maps z w{z),z s(z) 
are analytic for z g] exp[— 7(c)], 0[. When z == 0, this is no longer true, and we 
wish to describe the curve for z 0. The point {s{z),w{z)) tends to (1,0), and 
the following lemma (proven in [6]) describes the behaviour of s(z) — 1. 



Lemma 5.1. For z 0“^, the behaviours of s'{z) and w{z) are related by 
where h = —A' (1,0) is the entropy of the dynamical system (I^T). 



s'{z) 



-dt, 



We focus now on two particular (boundary) triples: the boundary Riemann triples 
®3?(a), and the Euclidean dynamical system with c(m) = m. 



5.2. Study of the boundary Riemann system 

We now prove Theorem 1.2. Here, we use the explicit expression (17) of the domi- 
nant eigenvalue A(s, w) together with two main lemmas. Lemma 5.2 describes the 
behaviour of the derivative A'(s,?/;) when (s,w) (I5O). 

Lemma 5.2. For the boundary Riemann system one has 



aay-f-X{s,w) = ^ 

ow a(s - 



_(y,«(«-i)/(«-i) _ 1) + 

+0(|s - 1| + {s,w) 



w 



,q;(s— 1)/(q;— 1 ) 



a — 1 
-( 1 , 0 ). 



1 + 



(26) 



Proof. It uses Mellin transforms. We assume s near 1 and 0 < it; < 1. The function 
^A(s,u;) is considered first as a function of w and denoted by Ks{w)^ 



Ks{w) := y m“-iexp[-m“-^M;] — • 

CiaY ^ 

Ks{w) is a harmonic sum (also a version of the generalized polylogarithm) and 
the Mellin transform of function Kg is then explicit, 

^ 

The transform has poles at the nearby points u = 0 (due to function F) 
and u = a(l — s)/{a - 1) (due to function (). and its existence strip is 3?(u) > 
q;(1 — s)/(q! — 1). The Mellin inversion theorem yields, for any D > a(l — s)/(a — 1), 

— 1 1 fD-yioo 

Ks{w) = — / r(u)C(as + (m - l)(a - l))w““ du, 

C(a)* 2 z7t 




486 



Eda Cesaratto and Brigitte Vallee 



and shifting the integral to the left leads to 

C(a)* • K{s, w) = - a + 1) 

0 — 1 \a — 1 / 

/ r{u)({as -h {u— l){a - du. 

2i 7T J-i/2-ioo 

By well-known properties of the zeta function (its growth is controlled uniformly 
in vertical strips), the remainder integral is uniformly with respect to s 

in the stated range {s near 1 and 0 < u; < 1). Consequently, one has 

C{aY • Ks(w) = _ q: + 1) + 

0 — 1 \o — 1 / 

Assume now that s 1. Both quantities 



C(os — o + 1) — 



o(s - 1) 



- 7 , 



o(l — s) 
0-1 



+ 



o 



o(s — 1) 



+ 7 



are 0{s — 1). Thus, when both w and s - 1 tend to 0, Relation (26) holds. ■ 



Lemma 5.3. For the boundary Riemann system !Bi?(o), the quantity (s(z) — 1) log 2 
tends to 0 when z tends to 0. Moreover w x for z ^ 0. 



Proof. There are three main steps. 

Step 1. We first remark that, for any boundary Riemann system !B3?(o), the quan- 
tity {s{z) — l)logz is bounded when >2: tends to 0. This is due to the fact that 
the set Fm contains the set Ek of reals whose digits m are less than K with 
K = together with an easy estimate of the Hausdorff dimension of Ek 

(see [6]). 

Step 2. We now prove that z, w, and {s — 1) are polynomially related when z ^0: 
With Lemmas 5.1 and 5.2, the second equation of (S) system, and this last result, 
we first deduce that \s — l||logz| tends to 0. Using again relation (26) and the 
second equation of (8) system together with this new fact, we obtain that the 
difference 1 - tends to 0. This entails that \s - l||log^^;| tends to 

0, and this proves that | and \ \ogw{z)\ are equivalent. Then, with 

Lemma 5.1, this entails that |s — 1| = (for any 

p > 1) and both wlog^ z and \s — 1| log^ w are for any e > 0. 

Step 3. Finally, with Lemma 5.2, w = when z 0. 



This ends the Proof of Theorem 1.2. Note that the role played by the normalization 
constant {a — l)C(c>:) is due to the equality 
1 1 

= lim- —\{s,-w), (w-^0~,{s-l)logw-^0). (27) 

(a - l)C(o!) log w ow 

5.3. Study of the Euclid Dynamical System with c{m) = m 

Here, we will prove our Theorem 1.3. We first obtain qualitative results, with a 
comparison between the Euclid dynamical system and the boundary system !BIR(2), 
which plays the role of an approximate model, and is denoted by x. 

Consider a compact set 82 = {(<r, i^) G Si : ct 2 < cr < 1, 1/1 < 1/ < 0}, with <72 > cto? 
and 1/1 < 0. The explicit expression (19) proves (see [6]) the existence of strictly 
positive constants a, h such that, for any (s, w) G 82 

« • w) < w)<b- A^„(s, w). 



(28) 




Hausdorff dimension and bounded digit averages 



487 



The first two steps of Lemma 5.3 extend to the Euclidean boundary triple. First, 
the result due to Hensley [14] and recalled in (5) proves that |s — 1| log z is bounded 
along C. Then, a step quite similar to Step 2 of Lemma 5.3 shows that {s — l),w 
and z are polynomially related when z ^ 0. Then, when (s(z), it;(z)) tends to 
(1,0) on the curve C, the quantity {s — 1) logu; tends to 0. 

In order to extend Step 3 of Lemma 5.3 to our present study, we need an ex- 
act equivalent of {1/ logw)y^{s, —w), when both w and (s — 1) log it; tend to 0, 
which could be similar to (27). The following Lemma is the main step for proving 
Theorem 1.3 and shows that log 2 is now the good normalization constant. 



Lemma 5.4. Denote by fs^w the dominant eigenfunction of and by the 
dominant eigenmeasure of the dualU*^. Denote by F{s^w) the quantity 

oo /'ll 

F{s,w) = ^ -mexp[-»m| 

771=1 '' 1 \ / 

which deals with the stationary density f \ . For any constant B, 0 < B < log 2, the 
following holds when {s{z)^w{z)) tends to (1,0) along the curve S; 

(i) F{s, w) = —^logw + 0{z^). 

log 2 

(ii) \\fs,w - fiWi =0(z^), \\i's,w-Vi\\i^O{z^). 

{in) \{s,w)-F{s,w) = 0{z^), 



Sketch of the proof. For (z), we recall that the stationary density fi of the Euclidean 
System satisfies (log 2 ) ' fi{t) = 1/(1 + ^), so that 



I 



1 



2s 






mFt 



(m + 1) 

Remark that Jm{s) = 



Jm{^) 

log 2 



with Jj 



m 



2s 



l + 0(-) 
, m 






1 



m + t 



(m + 1)‘^^ m + l + t 
with a O uniform for s near 1. 



dt 



The Mellin transform of Fs{w) = F{s,w) is then explicit and given by 

(log 2) • F*{u) = r{u) • B{s,u) with B{s,u) := ^ —^^rnis). 



m>l 

Note that B{s,u) = C(2s -\-u — 1) + C{s,u) with |(7(s,iz)| < K\({2s + u)\. Then, 
the poles of Ff near 0 are u = 0 and u = 2 — 2s with Res[R(5, u)^u = 2 — 2s] = 1. 
Finally, 

(-log 2) • F{s,w) = B{s,0) + r(2 - 2s) • + 0(|s - 1|) + 0{w^/^) 

Now, when s is near 1, one has 

B{s, 0) = C(2s - 1) + 0(1) = + 0(1), r(2 - 2s) = 

We now use the same arguments as in Lemma 5.3: When s{z),w{z)) tends to (1, 0) 
on the curve 6, the quantity {s — 1) logu; tends to 0, and the relation proves (z). 
We omit the proof of (zz) (see [6]). For (zzz), the difference of interest is a series 
whose general term is the product of — mexp[— znm] by 



UST, 



1 



' fs,w{ ,) 



S,w(t) I 

Jl 



Then, with distortion property, 
\X'^{s,w)-F{s,w)\ =0{\\fs,w - fi\\i + \\iys,v 



(m + t) 



2s 



m-\-t 



\77l=l / 




488 



Eda Cesaratto and Brigitte Vallee 



The second factor is equal, up to a multiplicative factor, to the derivative A(^(s, w). 
Relation (28) proves that this factor is O(logz) when {s{z),w{z)) tend to (1,0) on 
the curve C. Using {ii) ends the proof of 

End of the proof of Theorem 1.3. We know now that w x more precisely 

that w = H- 0(z^^^^“^)]. We conclude with Lemma 5.1. 



6. Conclusions and Open Problems 

The main two results of this paper (Theorem 1.1 and Theorem 1.3) are of different 
nature. Theorem 1.1 is a general result which shows that the Hausdorff dimension 
of a wide class of sets can be characterized in terms of a solution of a differential 
system which involves the dominant eigenvalues of the weighted transfer operator. 
It deals with triples (/, T, c) of large growth, and, it is, in a sense, complementary 
of the multifractal result of [12] which is only obtained in the case of a triple of 
moderate growth. Is it possible to obtain our result in the SMS-setting? and the 
result of [12] in the S^S-setting? 

On the other hand. Theorems 1.2 and 1.3 deal with particular cases and pro- 
vide precise asymptotic estimates for the dimension of set F/vf , particularly in the 
continued fraction context. Theorem 1.3 proves that the characterization given in 
Theorem 1.1 is useful for effective computations, even for dynamical systems with 
memory. The main idea is to relate systems with memory to memoryless schemes 
which approximate them. It is clear that these “approximation” techniques are 
applicable to more general instances of Dynamical systems. 

Finally, Theorem 1.1 together with the results of Hanus, Mauldin and Urbahski 
[12] poses an important question: Is it possible to describe a general framework 
where systematic proven computation of dimensions can be provided ? In the case 
when the constraints deal with each digit in an independent way, the algorithm 
proposed by Daude, Flajolet, Vallee [7], used in [32] and justified by Lhote [25] 
provides (in polynomial time) proven numerical values for the Hausdorff dimension. 
The present case is certainly more difficult, since, now, a system (of two equations) 
has to be solved, whereas there was previously a unique equation to solve. 

We precisely described the sets of reals whose continued fraction expansion has 
all its prefix digit averages less than M. For performing the precise analysis of 
the Euclidean subtractive algorithm (see [33, 34]), one needs precise information 
on the set of rational numbers whose digits in the continued fraction expansion 
have an average less than M. This discrete problem is more difficult to solve than 
the present continuous one. In the case of “fast Euclidean Algorithms” , relative to 
costs of moderate growth, the weighted transfer operator is analytic at the 
reference point {s^w) = (1,0). Then, Tauberian Theorems or Perron’s Formula 
[33, 34, 2] allow a transfer “from continuous to discrete”. Here, it does not seem 
possible to use directly these tools, due to the non-analyticity of sit (1?0). 



References 

[ 1 ] Baladi, V., Positive Transfer operators and decay of correlations, Advanced Series 
in non linear dynamics, World Scientific, 2000. 

[2] Baladi, V., and Vallee, B., Distributional analyses of Euclidean algorithms, to 
appear in Proceedings of ANALCO’04, also Les Cahiers du GREYC, Universite de 
Caen, 2004. 




Hausdorff dimension and bounded digit averages 



489 



[3] Baladi, V., AND Vallee, B., Euclidean algorithms are Gaussian, available from 
the ArXiv, submitted, also Les Cahiers du GREYC 2004. 

[4] Besicovitch, A.S., Sets of fractional dimensions: On rational approximation to 
real numbers. J. London Math. Society, vol 9, pp. 126-131, 1934. 

[5] Billingsley, P., Probability and Measure John Wiley &; Sons, 1979 

[6] Cesaratto, E. and Vallee, B. Hausdorff dimension of real numbers with bounded 
digit averages. Long version of this paper, les Cahiers du GREYC 2004, Submitted. 

[7] Daude, H., Flajolet, P., Vallee B., An average-case analysis of the Gaussian 
Algorithm for lattice reduction Combinatorics, Probability and Computing No 6 pp 
397-433, 1997. 

[8] Eggleston, H., The fractional dimension of a set defined by decimal properties. 
Quaterly Journal of Mathematics, Oxford Series, 20, pp. 31-36, 1949. 

[9] Falconer, K., Fractal Geometry -Mathematical Foundations and Applications, 
John Wiley k Sons, New York, 1990. 

[10] Falconer, K., Techniques in Fractal Geometry, John Wiley k Sons, New York, 
1997. 

[11] Good, I. J. The fractional dimensional theory of continued fractions. Math. Proc. 
Cambridge Philos. Soc. 37 (1941) pp 199-228. 

[12] Hands, P., Mauldin, D., Urbanski, M., Thermodynamic formalism and multi- 
fractal analysis of conformal infinite iterated function system. Acta Math. Hungarica, 
Vol. 96, pp. 27-98, 2002. 

[13] Hensley, D., The Hausdorff dimensions of some continued fraction Cantor sets. 
Journal of Number theory 33, (1989) pp 182-198 

[14] Hensley, D., Continued Fraction Cantor sets, Hausdorff dimension, and functional 
analysis. Journal of Number Theory AO (1992) pp 336-358. 

[15] Hensley, D., A polynomial time algorithm for the Hausdorff dimension of a con- 
tinued fraction Cantor set. Journal of Number Theory 58 pp 9-45, 1996. 

[16] Hensley, D., The statistics of the continued fraction digit sum. Pacific Journal of 
Mathematics, Vol. 192, No2, 2000. 

[17] Hwang, H.-K., Theoremes limite pour les structures combinatoires et les fonctions 
arithmetiques, PhD thesis, Ecole Poly technique, Dec. 1994. 

[18] Hwang, H.-K., Large deviations for combinatorial distributions: I. Central limit 
theorems. The Annals of Applied Probability 6 (1996) 297-319. 

[19] Hwang, H.-K., On convergence rates in the central limit theorems for combinatorial 
structures, European Journal of Combinatorics 19 (1998) 329-343. 

[20] Jarnik, V., Zur metrischen Theorie der diophantischen Approximationen. Prace 
Mat. Fiz. 36, pp. 91-106, 1928. 

[21] Jarnik, V., Uber die simultanen diophantischen Approximationen. Math. Zeit., 33, 
pp. 505-543, 1931. 

[22] Kato, T., Perturbation Theory for Linear Operators, Springer- Verlag, 1980. 

[23] Khinchin, a. L, Continued Fractions, Dover Publications, Mineola, New York, 1997. 

[24] Lasota, a., Mackey, M., Chaos, Fractals and Noise; Stochastic Aspects of Dy- 
namics, Applied Mathematical Science 97, Springer, 1994. 

[25] Lhote, L., Computation of a Class of Continued Fraction Constants, Proceedings 
of ANALCO’04, appear 

[26] Mandelbrot, B., Intermittent turbulence in self-similar cascades: divergence of 
high moments and dimension of the carrier, J. of Fluid Mech. 62, pp. 331-358, 1974. 

[27] Mauldin, D., Urbanski, M. Dimensions and measures in infinite iterated function 
systems. Proc. London Math. Soc. (3) 73, pp 105-154, 1996. 




490 



Eda Cesaratto and Brigitte Vallee 



[28] Mauldin, D., Urbanski, M. Conformal iterated function systems with applications 
to the geometry of continued fractions, Trans. Amer. Math. Soc. 351 (1999) pp 4995- 
5025. 

[29] Peyriere, J., Calculs de dimensions de Hausdorff, Duke Math. J. Vol. 44., No 3, 
pp. 591-600, 1977. 

[30] Pesin, Y. B., Dimension Theory in dynamical systems: contemporary views and 
applications, Chicago Lectures in Mathh., The university of Chicago Press (1997). 

[31] Shallit, j.. Real numbers with bounded partial quotients. A survey. L’Ensei- 
gnement Mathematique, t. 38, pp 151-187, 1992. 

[32] Vallee, B., Fractions Continues a contraintes periodiques. Journal of Number 
Theory , 72, pp. 183-235, 1998. 

[33] Vallee, B., Dynamical Analysis of a Class of Euclidean Algorithms, Theoretical 
Computer Science vol 297/1-3 (2003) 447-486. 

[34] Vallee B., Digits and Continuants in Euclidean Algorithms: Ergodic vs Tauberian 
Theorems. Journal de Theorie des N ombres de Bordeaux pp 531-570, 2000. 

Eda Cesaratto 

Facultad de Ingenierfa, Universidad de Buenos Aires, Paseo Colon 850, Buenos 

Aires, Argentina. ecesara@fi.uba.ar 

Brigitte Vallee 

CNRS UMR 6072, GREYC, Universite de Caen, F-14032 Caen, Prance 

br igit t e . vallee® info . unicaen . fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Large Deviation Analysis of Space-Time Trellis 
Codes 

Adriana Climescu-Haulica 



ABSTRACT: A method for computing the probability of symbol error for 
space-time trellis codes is developed by means of results from large deviation theory 
for quadratic Gaussian forms. The issue is that in order to build a space-time trellis 
code with small probability of error it is sufficient to find a metric for the Viterbi 
decoding algorithm for which there exists a Cramer’s rate function who minimizes 
a criterion on the transfer function of the trellis system. In particular, this approach 
gives a theoretical explanation for a simulation result of [2]. 

The space-times coding is appropriate to multiple input multiple output an- 
tenna transmission. Assume a communication channel with n transmit and m 
receive antennas. Let A be the alphabet of information symbols and let L G IN* 
denotes the information stream duration. A space-time encoder is a bijective ap- 
plication £ : > C where C is a subset of the set of IN valued n x L matrices 

and denotes the codewords set. The encoder associated with a trellis code is a 
finite-memory system: its output depends on the input sequence and on the state 
of the device. Its behavior is completely described by a signal fiow graph admit- 
ting a rational transfer function. Figure 1. shows an example of 8 states space-time 
trellis code and its associated signal fiow graph. 

At time I the code symbols c\, 1 < i < n associated with an information 
symbol are transmitted simultaneously from the corresponding antenna.The 
signal at each receive antenna is a noisy superposition of the channel distorted 
versions of the n transmitted signals. Assume the same time invariant channel 
model as one considered in [3], where the construction of space-time trellis codes 
was first time derived: 



i=l 

where rj, is the signal received at instant I by the j-th antenna, j = 1,. ..m , 
H = {hij) is anxm random matrix describing the fading channel whose elements 
are complex Gaussian random variables, circularly distributed with zero mean 
and variance one, c = (c\) denotes the transmitted codeword and each Zj is a zero 
mean, complex Gaussian random variable, circularly distributed. 

Let T denotes the transfer function associated with the space time trellis. It 
is known that 



dT{x,y) 

dx 



X=1 



= F(y), 

d=df 





Figure 0.1 Example of space-time trellis code and its state diagram 



where df denotes the free Hamming distance of the code and Ad is the number of 
codewords with Hamming weight d. Let Ed denotes the event that the error arise 
from a path with Hamming weight d. Assuming the trellis is regular and such that 
there is no path which re-emerges with the all-zero path, having diverged at some 
point in the past, the probability of an error event satisfies 

L L 

P{E) = P {\JaEd) =Y,PiEd)=Y. ((c 7^ c) n c) = d)) 

d=df d=df 

where c = (c-) denotes the codeword detected from the received signal by a soft 
decision Viterbi algorithm, based on the Euclidean metric and *) denotes the 
Hamming distance. The Euclidean distance between c and c is defined as 

4(c,c) = ^|| H* (c'-cO IP 
1 = 1 

where * denotes Hermitian transposition. Expressing the Hermitian matrix Ki = 
(c^ — c!) (c^ — c^) by means of a unitary matrix Ui as Ki = [7/A//7*, the Euclidean 
metric above become 



L m L m 

d% (c,c) = 

1=1 j=i 1=1 j=i 

where — Uih.j is a complex Gaussian vector with zero mean and covariance 
matrix identity.lt follows that 



L m n 

4 (c, c) = E E E I 

1=1 j=i i=i 

with A^, i = 1,-n the eigenvalues of the matrix Ki and each | /3{l)-j p a x^(2) 
random variable. When d:n:(c,c) = d, the Euclidean distance can be expressed a^ 
a function of d. 




Space-Time Trellis Codes 



493 



L{d) n 
1=1 i=l 

with Bl{d) a x^(2m) random variable. When d goes to infinity, {L{d)) is a sequence 
of integers increasing to infinity. 

Following ideas from [1], a large deviation principle is proved for the quadratic 
Gaussian form expressed in the above relation, under some constraints on the family 
of eigenvalues (A^(d)) . This result is also an extension of the Gartner-Ellis theorem. 
It follows that there exist a rate function J : [0, +oo) ^ M such that 



As 



( L{d) n 
/=1 i=l 



fL(d) n 

p((c ^ c) n {d^{c,c) = d)) = p y; y^A'(d)p'(d) > o 



\l=l i=l 



the conclusion is that 



P{E) ^ F for small e > 0. 

This result explains that the better performance obtained in [2] for the (8,4,4) Reed 
Muller code, where the codewords are split evenly among two transmit antenna, 
than for a usual 1/2 space-time code is due to the behavior of theirs trellis transfer 
function with respect to the Euclidean norm. 

References 

[1] B.Bercu, F. Gamboa, and A. Rouault, Large deviations for quadratic forms of sta- 
tionary Gaussian processes. Stock. Proc. and AppL, 71 (1997) 75-90 

[2] E. Biglieri, G. Taricco, and A.Tulino, Performance of Space-Time Godes for a Large 
Number of Antennas, IEEE Trans. Inform. Theory, vol. 48, July 2002, 1794-1803 

[3] V. Tarokh, N.Seshadri, and A.R.Galderbank, Space-Time codes for high data rate 
wireless communication, IEEE Trans. Inform. Theory, vol. 44, March 1998,744-765 

Adriana Climescu-Haulica 

Laboratoire de Modelisation et Calcul, Institut d’Informatique et Mathematiques 
Appliquees de Grenoble 

51, rue des Mathematiques,38041-Grenoble cedex 9 Prance 
email: adriana.climescu@imag.ca 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



A Zero- One Law for First-Order Logic on 
Random Images 

David Coupler, Agnes Desolneux, and Bernard Ycart 



ABSTRACT: For anxn random image with independent pixels, black with 
probability p{n) and white with probability l—p{n), the probability of satisfying 
any given first-order sentence tends to 0 or 1, provided both p{n)ni and (1 - 
p{n))nk tend to 0 or +oo, for any integer k. The result is proved by computing 
the threshold function for basic local sentences, and applying Gaifman’s theorem. 



1 . Introduction 

The motivation for this work came for the Gestalt theory of vision (see [3] and 
references therein), a basic idea of which is that the human eye focuses first on 
remarkable or unusual features of an image, i.e. features that would have a low 
probability of occurring if the image were random. Hence the natural question: 
which properties of a random image have a low or high probability? Here we shall 
deal with the simplest model for random images: 

Definition 1.1. Let n be a positive integer. Consider the set Xn = {1, . . . , n}^, called 
the pixel set. An image of size nxn is a mapping from Xn to {0, 1} (white or black). 
Their set is denoted by En- It is endowed with the product of independent copies 
of the Bernoulli distribution with parameter p, that will be denoted by pn,p • 

n 

yrjeEn, l^nAv) = n . 

A random image of size nxn and level p, denoted by 3n,p, is a random element of 
En with distribution Pn,p- 

In other words, a random image of size nxn and level p is a square image 
in which all pixels are independent, each being black with probability p or white 
with probability 1 —p. 

We shall use the elementary definitions and concepts of first-order logic on 
finite models, such as described for instance in Ebbinghaus and Flum [4]. Gaif- 
man’s theorem ([8] and [4] p. 31) shows that first-order sentences are essentially 
local. They can be logically reduced to the appearance of fixed sub-images (pre- 
cise definitions will be given in section 2). Assume p is fixed. Then as n tends to 
infinity, any given sub-image of fixed size should appear somewhere in the random 
image with probability tending to 1: this is the two dimensional version of 
the well known “typing monkey” paradox. It justifies intuitively that the zero-one 
law should hold for fixed values of p. Our main result is more general. 




496 



D. Coupler, A. Desolneux, and B. Ycart 



Theorem 1.2. Let p{n) be a function from N into [0, 1] such that: 

2 2 

VA; = 1, 2, . . . , lim n^p{n) = 0 or oo and lim (1 - p{n)) = 0 or + oo . 

n— >oo n— ^oo 

Let A be a first-order sentence. Then: 

lim Prob[Jnp |= A] — 0 or 1 . 

Zero-one laws have a long history (cf. Compton [2] for a review and chapter 3 
of [4]). The first of them was proved independently by Glebskii et al. [9] and Fagin 
[6]. It applied to the first-order logic on a finite universe without constraints, 
and uniform probability. As an example, interpret the elements of En as directed 
graphs with vertex set {1, . . . ,n}, by putting an edge between i and j if pixel 
(i, j) is black. Then becomes a random directed graph (or digraph) with 
edge probability p (see for instance [11, 12], or [1] for a general reference). As a 
particular case of the Glebskii et al. - Fagin theorem, the zero-one law holds for 
first-order propositions on random digraphs. However, first-order logic on images 
is more expressive than on digraphs, since the geometry of images is not preserved 
in the graph interpretation. For instance, “there exists a horizontal segment of 
5 neighboring black pixels” is a first order sentence on images, not on digraphs, 
where neither “horizontal” nor “neighbor” can be expressed. 

The theory of random (undirected) graphs was inaugurated by Erdos and 
Renyi [5] (see [1, 16] for general references). The zero-one law holds for random 
graphs with edge probability p, as a consequence of Oberschelp’s theorem [13] 
on parametric classes (see [4] p. 74 or [16] p. 318). At first, zero-one laws were 
essentially combinatorial, as they applied to the uniform probability on the set 
of all structures, corresponding to edge probability p = ^ in the case of graphs. 
It was soon noticed that they also hold for any fixed value of p. But it is well 
known that random graphs become more interesting by letting p = p(n) tend to 
0 as n tends to infinity. A crucial notion for random graphs is the appearance 
of given subgraphs ([16] p. 309). The threshold function for the appearance of a 
given subgraph in a random graph is p(n) = n~^ , where v and e are integers. For 
p(n) = n~e ^ the probability of appearance for certain subgraphs does not tend to 
0 or 1. Using the extension technique, ([7, 6] and [4] p. 73), Shelah and Spencer 
[15] made a complete study of those functions p(n) for which the zero-one law 
holds for random graphs, and proved in particular that it does for p{n) = for 
any irrational a. Theorem 1.2 is the analogue for random images of Shelah and 
Spencer’s result. To understand why, first notice that the random image model 
is invariant through exchanging black and white, together with p and 1— p. Thus 
we will consider only functions p(n) tending to 0. We shall define precisely the 
notion of threshold function in section 3, and prove that all threshold functions for 
patterns are of type p(n) = n“ ^ : the zero-one law does not hold for these values. 
For instance, if p(n) is small (resp.: large) compared to the probability of 
having at least one black pixel tends to 0 (resp.: 1). But for p(n) = n“^, it tends 
to 1-e”^. Theorem 1.2 essentially says that the zero-one law holds for any function 
p{n) which is not a threshold function. 

It is worth pointing out here that theorem 1.2 can be extended to other 
random structures, along two different directions. Firstly, we chose to restrict the 
study to binary images, using a single unary relation in the language (cf. section 2) . 
With slight modifications of the proofs, and the values of threshold functions, one 
could introduce a finite set of “color” unary relations, allowing for the coding of 




Zero-one law for random images 



497 



multilevel gray or color images. The other possible generalisation concerns the type 
of graphs. An image is essentially a colored square lattice. The crucial property 
of that graph for our proof is that there exists a fixed number of vertices at fixed 
distance of any vertex (balls have bounded cardinality). Our study extends to any 
family of graphs with bounded balls. For instance, theorem 1.2 also holds for a 
randomly colored d-dimensional square lattice with points, up to replacing 

A 

by nfc m its statement. 

Section 2 is devoted to first-order logic on images. There we shall discuss basic 
local sentences (definition 2.2 and [4] p. 31), and reduce them to combinations of 
“pattern sentences” (definition 2.3), showing that a zero-one law holds for all first- 
order sentences if it holds for basic local or pattern sentences (proposition 2.4). 
This will trivially imply that theorem 1.2 holds for fixed values of p. The section 
will end with two examples of (second-order) sentences whose probability under 
1 tends to 

In section 3, we shall define the notion of threshold function (definition 3.2) 

2 

and prove that all threshold functions for basic local sentences are of type 
(proposition 3.4). Theorem 1.2 easily follows from propositions 2.4 and 3.4. 



2. First-order logic for images 

We shall follow the notations and definitions in chapter 0 of [4] for the syntax and 
semantics of first-order logic. The vocabulary is the set of relations (or predicates). 
They apply to the universe (or domain) . In our case the universe will be the pixel 
set Xn. Image properties will not only be statements on colors of pixels but also 
about their geometrical arrangement. Our vocabulary will consist of 1 unary and 
4 binary relations. The unary relation C is interpreted as the color: Cx means that 
X is a black pixel and -i(7x that it is white. Before defining the binary relations, 
the geometry of Xn needs to be precised. 

The pixel set Xn is embedded in Z^, and naturally endowed with a graph 
structure. In image analysis (see for instance chapter 6 of Serra [14]), the cases 
most often considered are: 

• the A- connectivity. For i, j > 0, the neighbors of (i, j) are: 

+ - 1) . 

• the S- connectivity. The 4 diagonal neighbors are also included: 

{i + l,j + 1), (i -l,j + 1), (i + 1, j - 1), (i - l,i - 1) . 

At this point a few words about the borders are needed. In order to avoid partic- 
ular cases (pixels having less than 4 or 8 neighbors), we shall impose a periodic 
boundary, deciding for instance that (l,j) is neighbor with (n,j),(n,j — 1), and 
(n,j + 1), so that the graph becomes a regular 2-dimensional torus. Although it 
may seem somewhat unnatural for images, without that assumption the zero-one 
law would fail. Consider indeed the (first-order) sentence “there exist 4 black pixels 
each having only one horizontal neighbor” . Without periodic boundary conditions, 
it applies to the 4 corners, and the probability for a random image 3n,p to satisfy it 
is From now on, the identification n -f- 1 = 1 holds for all operations on pixels. 

Once the graph structure is fixed, the relative positions of pixels can be 
described by binary predicates. In the case of 4-connectivity 2 binary predicates 
suffice, U (up) and R (right): Uxy means that y — x -h (0, 1) and Rxy that 
y = X + (1, 0). In the case of 8-connectivity, two more predicates must be added. 




498 



D. Coupler, A. Desolneux, and B. Ycart 



Di and i? 2 ‘ Dixy means that y = x + ( 1 , 1 ) and D 2 xy that y = x ( 1 ,— !)• 
For convenience reasons, we shall stick to 8 -connectivity. Thus the vocabulary of 
images is the set {C, C/, i?, Di , D 2 }- Once the universe and the vocabulary are fixed, 
the structures are particular models of the relations, applied to variables in the 
domain. To any structure, a graph is naturally associated ([4] p. 26), connecting 
those pairs of elements {x, y} which are such that Sxy or Syx are satisfied, where 
S is any of the binary relations. Of course only those structures for which the 
associated graph is the square lattice with diagonals and periodic boundaries will 
be called images. As usual, the graph distance d is defined as the minimal length 
of a path between two pixels. We shall denote by B(x, r) the ball of center x and 
radius r: 

B{x, r) = {y e Xn] d{x, y) <r} 

In the case of 8 -connectivity, B{x, r) is a square containing (2r + 1)^ pixels. 

Formulas such as Cx, Uxy^ Rxy...axe called atoms. The first-order logic 
([4] p. 5) is the set of all formulas obtained by recursively combining first-order 
formulas, starting with atoms. 

Definition 2.1. The set of first-order formulas is defined by: 

(i) All atoms belong to Jii. 

(ii) If A and B are first-order formulas, then {~^A), ifi/xAx) and (AaB) also 
belong to Li. 

Here are two examples of first-order formulas: 

(i) Vx, 2 /, {Rxy AUyz) ^ Dixz, 

(ii) {3y {Rxy A Uyz)) Dixz 

Notice that any image satisfies them both: adding the two diagonal relations D\ 
and D 2 does not make the language any more expressive. The only reason why 
the 8 -connectivity was preferred here is that the corresponding balls are squares. 

We are interested in formulas for which it can be decided if they are true or 
false for any given image, i.e. for which all variables are quantified. They are called 
closed formulas, or sentences. Such a sentence A defines a subset An of En- that 
of all images rj that satisfy A (77 |= A). Its probability for will still be denoted 
by 

~ B'^ob\3n,p ~ ^ ^ y"n,p{v) * 

r]\=A 

Gaifman’s theorem ([4] p. 31), states that every first-order sentence is equivalent 
to a boolean combination of basic local sentences. 

Definition 2.2. A basic local sentence has the form: 



3xi . . . 3xm f\ d{xi,Xj) > 2r A A M^i) ) . (1) 






where: 



• m and r are fixed nonnegative integers, 

• for all i = 1, . . . ,m, '0j(x) G -Ci is a formula for which only variable x is 
free (not bound by a quantifier), and the other variables all belong to the 
ball B{x,r). 




Zero-one law for random images 



499 



For any x and a fixed radius r, consider now a complete description D{x) 
of the ball B{x,r), i.e. a first-order sentence for which all statements concerning 
pixels at distance at most r of x are either asserted or negated. There exists a 
single image Id of size (2r + 1) x (2r -f 1), centered at x, satisfying it. Thus D{x) 
can be interpreted as: “the pattern of pixels at distance at most r of x is . 



Defboition 2.3. A pattern sentence has the form: 



3xi . . . 3xm I f\ d{xi, Xj) > 2r A A A(x,) I , (2) 

Ll<2<j<m / \l<i<m 



where: 



• m and r are fixed nonnegative integers, 

• for all i = I, . . . ,m, Di{x) is a complete description of the ball B{x,r), 



Examples of (interpreted) pattern sentences are: 

(i) “there exist 3 black pixels” , 

(ii) “there exists a 3 x 3 white square” , 

(iii) “there exist 3 non overlapping 5x5 white squares with a black pixel on 
the center” . 

Figure 2.1 gives another illustration. Obviously, pattern sentences are particular 




Figure 2.1 Illustration of a pattern sentence, for m = 4 and r = 1. 

cases of basic local sentences. Proposition 2.4 below reduces the proof of zero-one 
laws for random images to pattern sentences. 

Proposition 2.4. Consider the following three assertions. 

(i) The probability of any pattern sentence tends to 0 or 1. 

(ii) The probability of any basic local sentence tends to 0 or 1. 

(iii) The probability of any first order sentence tends to 0 or 1. 

Then (i) implies (ii) and (ii) implies (iii). 




500 



D. Coupler, A. Desolneux, and B. Ycart 



Proof. Observe first that if the probabilities of sentences A and B tend to 0 or 
1, then so do the probabilities of ~^A and A A B, This follows from elementary 
properties of probabilities. As a consequence, if the probability of A tends to 0 or 
1 for any A in a given family, this remains true for any finite boolean combination of 
sentences in that family. Thus Gaifman’s theorem yields that (ii) implies (in). We 
shall prove now that every basic local sentence is either unsatisfiable or a finite 
boolean combination of pattern sentences. Indeed, consider a formula 'ip(x) for 
which only variable x is free, and the other variables all belong to the ball J5(x, r). 
Either it is not satisfiable, or there exists a finite set of (2r + 1) x (2r + 1) images 
(at most ) which satisfy it. To each of those images corresponds a complete 

description D{x) which implies 'ip{x). So ^|J{x) is equivalent to the disjunction of 
these D{xys: 

tpix) ^ \f D{x) . (3) 

D{x)^'ip{x) 

In formula (1), one can replace each 'ipi{xi) by a disjunction of complete descrip- 
tions. Rearranging terms, one sees that the basic local sentence (1) is itself a finite 
disjunction of pattern sentences, □ 

The zero-one law for fixed values of p is an easy consequence of proposition 2,4. 
Indeed, for fixed p, the probability of any pattern sentence tends to 1. To see why, 
consider the following sentence: 

3x I f\ A(x + ((i-l)(2r+l),0)) I , (4) 

interpreted as: “sub-images . . . , appear in m consecutive, horizontally 
adjacent balls of radius r”. It clearly implies (2). But (4) is equivalent to the 
appearance of a given sub image on a rectangle of size (2r + 1) x (m(2r + 1)). This 
occurs in a random image 3n,p with probability tending to 1 as n tends to infinity. 
Thus (2) has a probability tending to one of being satisfied by 3n,p- 

This section ends with two counter-examples of (second-order) sentences the 
probability of which does not tend to 0 or 1. The first one is “the number of black 
pixels is even” . This is one of the basic examples of second order sentences, that do 
not belong to first order logic (see [4] example 1.3.4 p. 21 and p. 37). Its probability 
is 

2 

I E ’ 

k=o ^ ' 

which tends to | for any p such that 0 < p < 1 . 

The second example is more relevant to images. Define a G-connected path 
as a path where the directions (—1,1) forbidden, or more pre- 

cisely a m-tuple of pixels (xi, . . . , x^), such that for z = 1, . . . , m — 1, Xi^i G 
Xi ± {(1,0), (0, 1), (1, 1)}, and the borders of the image are not crossed (see an 
illustration on figure 2.2). Consider now the two sentences: 

(i) BLR: “there exists a 6-connected path of black pixels from left to right” , 

(ii) WTB: “there exists a 6-connected path of white pixels from top to bot- 
tom” . 




Zero-one law for random images 



501 



Some geometrical considerations show that an image satisfies BLR if and only if 
it does not satisfy WTB (this would not hold for 4- or 8-connected paths: see [14] 
p. 183). Take now p = |. Symmetry implies that 

hence both probabilities must be equal to | . 

The sentences BLR and WTB are examples of those properties studied by 
percolation theory (see Grimmett [10] for a general reference). Actually the random 
image model that we consider here is a finite approximation of site percolation ([10] 
p. 24). Using percolation techniques, one can prove that fin^p{BLR) tends to 0 if 
p < ^, to 1 if p > ^. 




Figure 2.2 A 6-connected path of black pixels from left to right. 



3. Threshold functions for basic local sentences 



The notions studied in this section have exact counterparts in the theory of random 
graphs as presented by Spencer [16]. We begin with the asymptotic probability of 
single pattern sentences, which correspond to the appearance of subgraphs ([16] 
p. 309). 

Proposition 3.1. Let r and k be two integers such that 0 < k < (2r + 1)^. Let I be 
a fixed (2r -f 1) x (2r -|- 1) image, with k black pixels and h = {2r + 1)^ - k white 
pixels. Let D{x) be the complete description of the ball B{x, r) satisfied only by 
a copy of I, centered at x. Let D be the sentence {3x D{x)). Let p = p{n) be a 
function from N to [0, 1]. 

If lim n^p{n)^ = 0 then lim Pnp(n)(^) = 0 • (^) 

If lim n^p(n)^(l — p(n))^ = Too then lim firip(n){D) = 1 . (6) 

n—^oo n— >oo ' 



If lim n^(l - p(n))^ = 0 then lim p(n){D) = 0 . 

n^oo n^oo ' 



( 7 ) 




502 



D. Coupler, A. Desolneux, and B. Ycart 



Proof. We already noticed the symmetry of the problem: swapping black and 
white together with p and 1—p should leave statements unchanged. In particular 
the proofs of (5) and (7) are symmetric, and only the former will be given. 

For a given x, the probability of occurrence of I in the ball B(x, r) is : 

/J-n,p(n){D{x)) = p{n)^{l - p{n))'^ . 

The pattern sentence D is the disjunction of all D(x)’s: 

y D{x). 

xeXr^ 

Hence: 

fJ'n,p(n){D) < n^p(n)*=(l -p(n))'* , 

from which (5) follows. 

Consider now the following set of pixels: 

Tn = {{r + 1 + a{2r + l),r + 1 + /?(2r + 1)) , ^ 

where [• J denotes the integer part. Call r(n) the cardinality of T(n): 

I I ^ 

/ \ I ^ 

r(n) = 



( 8 ) 



2r + l 



which is of order n^. Notice that the disjunction of D{x)'s for x ETn implies D. 

V ^ ■ 



xeTr, 



The distance between any two distinct pixels x,y ^Tn is larger than 2r, and the 
balls B{x^r) and B{y,r) do not overlap. Therefore the events \= D{xy^ for 
X eTn are mutually independent. Thus: 

— l^n,p{n) ^ \J 



\xeTr, 



r(n) 



> 1 - exp(-r(n)p(n)''(l - p{n)f) , 



hence (6). 



□ 



Due to the symmetry of the model, we shall consider from now on that p{n) < \ . 
Proposition 3.1 shows that the appearance of a given sub-image only depends on 
its number of black pixels: if p{n) is small compared to n~ ^ , then no sub image of 

fixed size, with k black pixels, should appear in 3(n,p(n)). If p{n) is large compared 

2 

to , all sub-images with k black pixels should appear. Proposition 3.1 does not 
cover the particular cases k = 0 (appearance of a white square) and fc = (2r + 1)^ 
(black square). They are easy to deal with. Denote by W (resp.: B) the pattern 
sentence {3x D{x)), where D{x) denotes the complete description of B{x, r) being 
all white (resp.: all black). Then p>n,p{n)iW) always tends to 1 (remember that 
p{n) < ^). Statements (5) and (6) apply to B, with k = {2r 1)^. 



The notion of threshold function is a formalisation of the behaviors that have 
just been described. 




Zero-one law for random images 



503 



Definition 3.2. Let A he a sentence. A threshold function for A is a function r{n) 
such that: 

vin) 

lim — = 0 implies lim (^) = 0 , 

n-^oo r{Tl) n— >oo ^ 



and : 



lim 

n— >oo 



P{n) 

r{n) 



= 4-00 implies lim -p(n) (^) = 1 • 

n— >-oo ^ 



Notice that a threshold function is not unique. For instance if r(n) is a thresh- 
old function for A, then so is cr{n) for any positive constant c. It is customary 
to ignore this and talk about “the” threshold function of A. For instance, the 
threshold function for “there exists a black pixel” is 

Proposition 3.1 essentially says that the threshold function for the appear- 

2 

ance of a given sub-image I is where k is the number of black pixels in 

I. Proposition 3.4 below will show that the threshold function for a basic local 

2 

sentence L is n , where k{L) is an integer that we call the index of L. Its def- 
inition refers to the decomposition (3) of a local property into a finite disjunction 
of complete descriptions, already used in the proof of proposition 2.4. 



Definition 3.3. Let L be the basic local sentence defined by: 



3xi . . . 3Xn 



/\ d{xi 

il<i<j<m 



j)>2r A f\ ijji{xi) 



. Ki<m 



If L is not satisfiable, then we shall set k{L) = -hoc. If L is satisfiable, for each 
X = 1, . . . , m, consider the finite set { A,i , • • • , Di,di } of those complete descriptions 
on the ball B{xi^r) which imply 'ipi{xi). 






Each complete description Dij(xi) corresponds to an image on B{xi,r). Denote 
by kij its number of black pixels. 

The index of L, denoted by k{L) is defined by: 

k{L) = maxminfc^ j . (9) 

i=i j=i ’ 



The intuition behind definition 3.3 is the following. Assume p{n) is small 
_ 2 

compared to n . Then there exists i such that none of the Dij(xi) can be 

satisfied, therefore there is no Xi such that 'ipi{xi) is satisfied, and L is not satisfied. 

_ 2 

On the contrary, if p{n) is large compared to n , then for all i = 1, . . . , m, 

'ipi{xi) should be satisfied for at least one pixel Xj, and the probability of satisfying 

_ 2 

L should be large. In other words, n is the threshold function of L. 

Proposition 3.4. Let L be a basic local property, and k{L) be its index. If L is 

_ 2 

satisfiable and k{L) > 0, then its threshold function is n . If k{L) = 0, its 
probability tends to 1 (for p{n) < 



Proof. Assume L is satisfiable (otherwise its probability is null) and k{L) > 0. Let 
r(n) = n . For p{n) < we need to prove that /Xn,p(n) tends to 0 if p{n)lr{n) 




504 



D. Coupler, A. Desolneux, and B. Ycart 



tends to 0, and that it tends to 1 if p{n)/r{n) tends to +oo. The former will be 
proved first. 

Consider again the decomposition of L into complete descriptions: 

L 3xi . . .3xm /\ d{xi,Xj) > 2r\ A j \J Dij{xi) 

\l<i<j<m J yl<i<m l<j<di 

If p{n)/r{n) tends to 0, there exists i such that: 

Vj = 1, . . . , di , lim = 0 . 

n—^oo 

By proposition 3.1, the probability of {3x Dij{x)) tends to zero for all j = 
l,...,di. Therefore the probability of (3x'ipi{x)) tends to 0, which implies that 
Pri,p(n){L) tends to 0. 

Conversely, for each i = 1, . . . ,m, choose one of the Dij{xys, such that the 
number of black pixels in the corresponding image is minimal (among all 
Denote that particular description by Di{x). Consider now the following pattern 
sentence, which implies L: 




3xi...3xm ( f\ d{xi,Xj)>2r A Di{xi) 






As in the proof of proposition 3.1, we shall use the lattice Tn^ defined by (8). 
Remember that its cardinality r(n) is of order n^. The pattern sentence (10) is 
implied by: 



3xi . . . 3Xm A G Tn A y\ Xi^xA ^ \ !\ Di{xi) . (11) 






Assume first that k{L) — 0. Then necessarily, for each i, the image corresponding 
to Di(x) has only white pixels. With p{n) < the probability of observing a 

(2r + 1) X (2r + 1) white image is larger than tt = Since sub-images 

centered at the points of are independent, the probability of (11) is larger 
than: 




which tends to 1 as n tends to infinity. 

Assume now that k{L) > 0. The images corresponding to the minimal de- 
scriptions Di need not be all different: renumber different descriptions Di as 
Dj,...,D^,. Denote by k{i) the number of black pixels of D[ (hence k{L) = 
max{A;(i)}). Let 7r^(n) be the probability of D'(x), for a given x: 

7Te(n) = p(n)''W(l . 

Prom the random image define the random variable Ni as the number of those 

pixels Xi G Tn such that is described by D[[xi) on the ball B{xi^r). Since the 

different balls do not overlap, Ni has a binomial distribution, with parameters r(n) 
and 'Ki{n). Assuming p{n)/r{n) tends to -hoc, it is easy to check that the product 

r{n)'Ki(n) also tends to infinity. Indeed, r(n) is of order n^, and r(n)^^^^ = 

is at least of order n“^. Therefore the probability that Ni is larger than m tends to 




Zero-one law for random images 



505 



1 for each z, which implies that the probability for all the TV^’s to be larger than m 
also tends to 1. But if all the iV^’s are larger than m, then one can be sure that all 
the Di{xi) are satisfied for different centers . . . ^Xm oi the lattice Therefore 
Jn,p satisfies (11), hence (10) and L, □ 

Having characterized the threshold functions of all basic local properties, the 
proof of theorem 1.2 is now clear. If p{n)nk tends to 0 or +oo for any positive 
integer fc, then by proposition 3.4 the probability of any basic local sentence tends 
to 0 or 1. This remains true for any boolean combination of basic local sentences 
(cf, proposition 2.4), By Gaifman’s theorem, these boolean combinations cover all 
first-order sentences. Hence the zero-one law holds for first-order logic. 



References 

[1] B. Bollobas. Random Graphs. Academic Press, London, 1985. 

[2] K.J. Compton. 0-1 laws in logic and combinatorics. In L Rival, editor. Algorithms 
and order, pages 353-383. Kluwer, Dordrecht, 1989. 

[3] A. Desolneux, M. Moisan, and J.M. Morel. Meaningful alignments. Int. J. Computer 
Vision, 40(l):7-23, 2000. 

[4] H.D. Ebbinghaus and J. Flum. Finite model theory. Springer- Verlag, Berlin, 1995. 

[5] P. Erdos and A. Renyi. On the evolution of random graphs. Mat. Kuttato. Int. KozL, 
5:17-60, 1960. 

[6] R. Fagin. Probabilities on finite models. J. of Symbolic Logic, 41:50-58, 1976. 

[7] H. Gaifman. Concerning measures in first-order calculi. Israel J. of Mathematics, 
2:1-18, 1964. 

[8] H. Gaifman. On local and non local properties. In J. Stern, editor. Logic Colloquium 
’81, pages 105-135, North Holland, Amsterdam, 1982, 

[9] Y.V. Glebskii, D.I. Kogan, M.I. Liogonkii, and V.A, Talanov. Range and degree of 
realizability of formulas in the restricted predicate calculus. Cybernetics, 5:142-154, 
1969. 

[10] G. Grimmett. Percolation. Springer- Verlag, New York, 1989. 

[11] R.M. Karp. The transitive closure of a random digraph. Rand. Struct. Algo., 1(1):73- 
94, 1990. 

[12] T. Luczac. The phase transition in the evolution of random digraphs. J. Graph 
Theory, 14(2):217-223, 1990. 

[13] W. Oberschelp. Asymptotic 0-1 laws in combinatorics. In D. Jungnickel, editor. 
Combinatorial theory, volume 969 of L.N. in Mathematics, pages 276-292. Springer- 
Verlag, Berlin, 1982. 

[14] J. Serra. Image analysis and mathematical morphology, volume 1. Academic Press, 
New York, 1982. 

[15] S. Shelah and J. Spencer. Zero-one laws for sparse random graphs, J. Amer. Math. 
Soc., 1:97-115, 1988. 

[16] J. Spencer. Nine lectures on Random Graphs. In P. Bernard, editor, Ecole d’ete de 
probabilite de Saint-Flour XXI, volume 1541 of L.N. in Mathematics, pages 293-343. 
Springer- Verlag, New York, 1991. 



David Coupler, Agnes Desolneux, and Bernard Ycart 

MAP5 CNRS UMR 8145, Universite Paris 5, Prance 
{ coupier , desolneux , ycart } ® mat h-info. uni v-paris5 . fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Coarse and Sharp Transitions for Random 
Generalized Satisfyability Problems 

Nadia Creignou and Herve Daude 



ABSTRACT: We study threshold phenomena for random generalized sat- 
isfiability problems. These fundamental problems were defined by Schaefer^ who 
gave a complete complexity classification. We give here a complete classification 
of the nature (coarse or sharp) of the threshold for all generalized satisfiability 
problems. This new classification is based on easily decidable local properties, and 
thus provides an exact probabilistic counterpart of Schaefer’s complexity result. 
[T. Schaefer. The complexity of satisfiability problems, in Proceedings 10th STOC, 
San Diego (CA, USA), pages 216-226. Association for Computing Machinery, 
1978.] 



1. Introduction 

The satisfiability problem (SAT) is the problem of determining whether there exists 
an assignment of values to the variables of a given Boolean formula which causes it 
to evaluate to true. While this problem appears in numerous engineering, scientific 
and operations research applications, it is also unfortunately the prototypical NP- 
complete problem. Average case analysis and experiments have provided evidence 
of the existence of a phase transition for the probability of a random formula being 
satisfiable (see for instance [12]). More precisely, a sharp threshold phenomenon 
has been observed in the probability of a random A:-CNF formula being satisfiable 
with respect to the ratio of the number of clauses to the number of variables. 
Experiments have revealed that the critical value of this ratio, c/c, at which the 
phase transition occurs, coincides with the value at which the average cost of 
natural solvers for the problem peaks. Therefore it has become clear that one 
could gather some interesting information in studying the relationship between 
hardness and random formulae (see [8]). 

In order to make general statements about the probabilistic behavior of nat- 
ural problems, one has to provide a framework which both captures a class of 
problems which exhibits much of the natural diversity of computational problems 
and allows a uniform probabilistic analysis. Constraint satisfaction problems are 
good candidates for that. These problems are interesting in their own right since 
they occur commonly in practice, in optimization and in settings arising from Arti- 
ficial Intelligence. In [4] we initiated the study of random generalized satisfiability 
problems, SAT(T), first defined by Schaefer in 1978 [15] (and which correspond to 
constraint satisfaction problems over the Boolean domain). In such a problem the 
set of types of constraints that are allowed in the input, T, is fixed. The major 
interest of this class of problems is that it has been completely classified in com- 
plexity, and thus it has been proved to be an excellent platform to search for a 
formal basis for empirical observations (see [7] for a survey). 




508 



Nadia Creignou and Herve Daude 



In [4] we gave a first location of the transition for such random problems, 
in considering a natural probabilistic model: the so-called Gn{p) model arising in 
random graph theory. Such a model has been independently considered by Molloy 
[13] for constraint satisfaction over finite domains. In [5] we clarified Friedgut- 
Bourgain’s sharpness criterion [10] for random CSPs and applied it to get a clas- 
sification of the nature of the threshold for the restricted framework of symmetric 
generalized satisfiability problems (in which, roughly speaking, the truth values 
0 and 1 have the same weight). We give now the complete classification of the 
nature of the threshold for all generalized satisfiability problems, together with 
the scale at which the transition occurs. As in Schaefer’s dichotomy theorem, this 
classification is based on easily decidable local properties on the set of the allowed 
constraints 3^. Our main contribution here is twofold. On the one hand we identify 
all the scales at which a coarse transition can occur. We show that the scale of such 
a transition is governed by minimal elements. On the other hand we show that 
supersaturated hypergraphs are a powerful combinatorial tool in to order prove 
sharpness. 



2. Random Boolean Constraint satisfaction problems 

2.1. Boolean Constraint satisfaction problems 

Consider n Boolean variables xi,...,Xn, a Boolean function / of arity fc, / : 
{0, 1}^ — > {O5 !}• A constraint generated from f is given by / and a subset of k 
indices in {1, . . . ,n}, (/, < . . . < ik), it is referred to as an application of the 

function / to xi, . . . . . ,0;^^). A truth assignment ^ : {xi, . . . ,Xn} — ^ 

{0, 1} satisfies such a constraint if • • • , = 1. 

The set C^(/) denotes the set of all constraints that can be generated from 
/ over n variables, = (^). 

Let $ : {xi, . . . ,Xn} — ^ {0? 1} he a truth assignment and C G Cn(f) be 
a constraint, C = (ffii < i2 < " • < ik) • We denote by ^(C) = f(^(xi, ),..., 
and Sn{C) = such that ^(C) = 1}. 

Throughout the paper T will denote a finite multi-set of constraint functions 
of fixed arity k over the domain {0, 1}, {/i • • • , fh}- The set Cn(T) is the set of all 
constraints that can be generated over n variables from any constraint function 

h 

fi in T, Cn(T) = \^Gn{fi)- There are such constraints, ^Cn(T) = 

O'#?- 

The set of collections of constraints, also called formulas, from Cn(T) is de- 
noted by CSPn(T) (or simply CSP(T) when no confusion can arise), CSPn(T) £ 
T(Cn(T)). We will denote by SAT^(T) (or simply SAT(T)) the property for such 
a collection of constraints of being satisfiable and by UNSAT(T) the property of 
being unsatisfiable. 

In order to obtain a nontrivial satisfiability property SAT (T) we restrict our 
attention to interesting sets T, that are sets T generating no empty constraints, 
that is for every / in T, /“^(l) 7^ 0, and in which there exist go and gi such that 
^o(O) = 0 and gi{l) = 0. These interesting sets correspond to non-0- valid and 
non- 1- valid sets in Schaefer’s terminology. 




Coarse and sharp transitions 



509 



Example 1. 

3-SAT: : for 0 < i < 7, let fi he the ternary Boolean function such that 
/•“^(O) corresponds to the dyadic expansion of i (for instance /o”^(0) = 
{(0,0,0)} and thus in disjunctive form /o(x,y, z) = {x W y \/ z), and 
/5~^(0) = {(1?0,1)} and thus f^{x,y,z) = [xW y\/ z)). Then, 3-SAT = 
SAT({/o,...,/7}). 

k-XOR-SAT: : let /io(^i, * • • ^Xk) = 0 • • • 0 and hi(xi, • • • ,Xk) = xi 0 

• • • 0 Xjt 0 1. Then, k-XOR-SAT = SAT({ho, hi})’ 

For any fixed T, we are interested in studying the probability that a collection 
of constraints in Cn{3^) is unsatisfiable. Such a collection (or formula) s can be 
seen as an element of (0, 1}^^ and we will consider the model, analogous to the 
so-called Gn{p) model of random graph theory, in which each constraint appears 
in s independently with probability p. For 0 < p < 1, let /ip be the probability 
measure defined on (0, 1}^^ by /ip({s}) = (1 — where w{s) denotes 

the size of s (w(s) = #{i / Si = 1}). We will study the asymptotic behavior of 
/ip( UNSAT(T)) when Nj (or equivalently n) tends to infinity. As a function of 
p, /ip(UNSAT(5')) defines an increasing one-to-one correspondence from [0, 1] onto 
[0,1] (see [3]). We will make precise the abrupt change of /ip(UNSAT(T)) from 
near 0 to near 1 in considering, for any c G [0, 1], the critical probability Pc{n,7) 
defined by /ip^ (UNSAT (T)) = c. Indeed, in our model N<j-p is the average number 
of constraints in a random formula. Discussing on the type of constraint functions 
allowed in T, we will give the scale of as a function of n. For any set T, 

after having made precise the scale at which a transition from satisfiability to un- 
satisfiability occurs, we will classify the nature, coarse or sharp, of this transition. 
Let us recall that one says that UNSAT(jT) has a sharp transition if for every 

£ g]0, 1/2] the ratio tends to 0 as n tends to infinity. If for some 

e > 0, lim ^hen one says that UNSAT (?) has a coarse transition. 

2.2. Minimal elements 

Gathering some useful results on minimal elements will play an essential role in 
the analysis of the nature of the threshold. 

Definition 2.1. [Minimal unsatisfiable collection] A collection of constraints m G 
CSP(?) is said to be minimal for UNSAT(?) if m ^ UNSAT(?) and for all m' 
strictly contained in m, m' G SAT (?) . 

For every collection of constraints s, Var(s) denotes the number of distinct 
variables occurring in s. Let denote the set of minimal formulas for UNSAT (?). 
Lower bound of the scale of the phase transition is obtained via minimal elements 
by the following formula. Let 

Mr = #{^ minimal for UNSAT(?),^^;(m) = r}. 

Since for every function / in ?, /“^(l) is not empty we have: 

Mp(UNSAT(5')) < 

r>2 

Observe that every m in verifies 

Var{m) < {k — l)w{m) + 1. 



( 1 ) 




510 



Nadia Creignou and Herve Daude 



This inequality provides a first upper bound for Mr- Indeed, in observing that 



iVj = #T 



k\ 



, standard bounds for binomial coefficients give: 



Mr < 



n 



(fc — 1) • r + 1 



((fe-l).r+l) . 



< n ’ 



• k • exp(fc) 



n 



Y- ( 2 ) 



Naturally the vacuity of the two following sets will play a central role for 
getting a lower bound of the scale of the phase transition. 



= {m € / Var{m) = {k — l)w{m) + 1}, 

= {m e / Var{m) = (fc — l)w{m)}. 

Moreover, it turns out that the vacuity of these two sets is determined by two 
simple local properties of ? that we define next. 



Definition 2.2. A Boolean function f : {0, 1}^ — {0, 1} strongly depends on one 
component if there exist e = 0 or l^l<i<k such that for all (ai, . . . ,afc) in 
{0, 1}^, • • • iQ>k) = 1 implies ai = e. 

Definition 2.3. A Boolean constraint function f : {0, 1}^ — ^ {0, 1} strongly de- 
pends on a 2-XOR- relation if there exist two indices 1 < i ^ j < k such that, for 
all (ai, . . . , ak) in {0, 1}^, • • • , ctk) = 1 implies ai 0 Oj = 1. 



Notation 1. In the sequel (PI) and (P2) will denote the following properties: 
(PI): 3^ does not contain any function strongly depending on one component. 
(P2): 3 does not contain any function strongly depending on a 2-XOR 



If 3 contains a function u depending on one component, without loss of 
generality suppose that u{ai, . . . ,ak) — 1 implies ak — 0. Then, let us consider 
a formula s of size (/c + 1) formed by k constraints whose associated constraint 
function is u and that have no common variables, together with the constraint 
applying the non-O-valid function to the set of the last indices of each of the 
previous constraints. For example s = (u, 1 < • • • < /c), • • • , (i^, — fc + 1 < • • • < 

<2k'< fc^). Such a “comb” -formula s is UNSAT and every UNSAT 
sulDformula s' of s must contain the ^o-constraint, thus verifying #Var{s') = 

{k — l)w{s') + 1. This shows that if if (PI) does not hold then ^ 0. The 
converse is also true and was proved in [4]. 

If 3 contains a function v depending on a 2-XOR relation, without loss of gen- 
erality suppose that v{ai ,. . . , ak) = 1 implies ai 0afc = 1. Then, let us consider the 
formula m formed by the three following constraints ( whose associated constraint 
function is u) Ci = (t?, 1 < • • • < k),C 2 = (u, 1 < fc+1 < fc+3 • • • < 3fc— 5 < 3fc-3) 
and Cs = {v,k < fc + 2 • • • < 3fc — 4 < 3fc — 3). Such a “triangle” -formula m is 
UNSAT and verifies fjMar{m) = 3{k — 1). If in addition 3 does not contain any 
function depending on one component, then m is also minimal UNSAT. Therefore, 
if (PI) holds but (P2) does not, then = 0 and 0. The converse is also 

true and was proved in [4]. 

Finally we have proved the following result: 



Proposition 2.1. Let 3 he a multiset of constraint functions. 

(i) i>f CL'^d only if (PI) does not hold. 

(a) = 0 and if and only if (PI) holds but (P2) does not 




Coarse and sharp transitions 



511 



As we will see next this proposition is a first step to establish a clear link 
between the types of allowed constraints, 3^, and the nature of the transition from 
SAT(J) to UNSAT(J). 



3. Coarse thresholds 

Theorem 3.1. Let 3^ he a multiset of constraint functions. If 3 contains a function 
depending on one component, i.e., if{P\) does not hold, let Zgr = min{^^;(m) / m G 

then for every 0 < s < 1, and n large enough: 

pl/2 

where ^{x) = 1 - exp(-(|f • ((feny^T+T)!)- 

Therefore, SAT(3T) has a coarse transition occurring at the scale . 

Proof: According to the discussion preceding Proposition 2.1, y is well defined 
and that 2 < /^ < (fc + 1). In the following, since no confusion can arise Zj will be 
simply denoted by Z. The lower bound for the scale of the transition is obtained 
from (1). Indeed, 



Mp(UNSAT(3^)) < Mr-p^ + Y^r-p". 

r<l—l r>l 

If r < Z - 1, then 

M,<I . ", ■ #?'! < 



y{k - 1) • r/ V r 

If r > /, then 

^{k - 1) ’ r ij \ r 

.r ^3^ 'P 'k ‘ exp(fc) 6 2 



Mr < 



^iV3r • k • exp(/cj^’^ 

< ri ^ exp(fc) y 



Hence, if 



n 



I ' ni 

/ip(UNSAT(5')) < + ITT/Tt) - ^ ^ - 2)- 



r, (3) gives 



(3) 



Therefore, • ^^(n) > 



62 



Z • fcexp(fc) 



- n 



1-1 



To get the second assertion, let us first choose a minimal UN SAT formula mo 
of weight Z, such that f^Var{mo) = (fc — 1)Z + 1, and let us consider the set Mo 
formed by the minimal UNSAT formulas isomorphic to mo- 

For each m G Mo, let Xm indicate if m is a subformula of s or not. As m is 
of size Z, Pp{Xm = 1) = p^ = o(l) when p = 0(n“i~“^). Now observe that: 



Mp(UNSAT( 3^)) > 1 - Mp( n ^- = 0). (4) 

mEMo 

The Janson Inequality (see for instance [2]) gives: 




512 



Nadia Creignou and Herve Daude 



Mp( n = 0) < exp(-j/ + 

mGMo 



A- 

2.-(1-p0^ 



where ^ fJ-p{Xm = 1) = p* • 

mGMo 



(fc 



n 

1)1 + 1 



and for 1 < j < / — 1 



A,= Y1 ^ip{Xm = Xm' = 1) = Y1 P^‘~' 

m, m' s.t. w{m D m')=j m, m' s.t. w(m D m')=j 

When ATgr • p = x.nV', first observe that 1/ ^ ' ((fc-i^)/-^i)! • Sec- 

ond, two elements of Mo having j common constraints share at least k + {j — 
l){k-l) variables, thus if p = then Aj = o(( 2 (fc_i);+ 2 -r- 0 -i)(fc-i)) ' 

Now, if X = (p~^{2e), then for n large enough, 1 - l^piOmeMo = 0) > 
(f{x) — e — e. Prom (4), we conclude that /Xp(UNSAT(J)) > e, when N<j • p = 
(f~^{2£) • nV". 



Theorem 3.2. Let J be a multiset of constraint functions. If 3^ does not contain 
any function depending on one component but contains a function depending on a 
2-XOR-retomn, i.e., if (PI) holds but (P2) does not, let l^r = mm{w{m) j m e 

My^}, then for every 0 < e < 1, n large enough: 



I 

£2 

2A;exp(A;) 



• n < • p£(n) < 'ip ^{2.£) • n. 



where ^p{x) = 1 - exp(-(|f 

Therefore, SAT(5') has a coarse transition occurring at the scale n. 



Proof(sketch): Note that 13 ^ is well defined from Proposition 2.1 and that 2 < Zgr < 
3. As above the lower bound of the scale of the phase transition is obtained from 

the second point of Proposition 2.1. Here we get, that if ^ ^ cxp(/c) _ 



n 



then /ip(UNSAT(5')) < 

For the upper bound, when N^r^p = x.n, the parameters in Janson inequality 
verify v ~ ' ({kYi)i)\ ~ 0{n~^), and thus the conclusion follows. □ 



4. Sharp thresholds 

According to the previous section, in order to get a complete classification for the 
nature of the threshold for SAT(J) it remains to deal with the following conjecture, 
which we stated in [4] . 

Conjecture 4 . 1 . [4] Let 7 he a multiset of constraint functions. If 3 does not 
contain any constraint function strongly depending on one component or on a 
2XOR-relation, then SAT (3) exhibits a sharp threshold. 

In [5] we gave a sharpness criterion specified for random CSPs. 

Theorem 4 . 1 . [5] Let 3 be a multiset of constraint functions. If the three following 
conditions are verified, then the monotone property SAT (3) has a sharp threshold. 




Coarse and sharp transitions 



513 



(DO) For each c G (0, 1), Pc(^) = 0{n^~^). 

{Dl) For every m minimal for UNSAT(9^), ffV ar{m) < {k — 1 ) • w(m) — 1. 
(D2) For each c G (0, 1), for each t, for all 5 = (5i, . . , , 5t) G {0, 1}^ and all 
7 > 0 

Mpc(n)(s ^ Qs, #-As{s) > 7 ■ ^ o(l), 

Qs denoting the property for s G CSP(y) of having no satisfying assign- 
ment with xi = 5i, . . , ,Xt = St, 

As{s) denoting, for s ^ Qs, the set of constraints C having at least 
one variable in {xi ,. . . , Xt} and such that s U {C} G Qs- 

If both (PI) and (P2) hold, then the two first conditions are verified (see [4]). 
Verifying the last condition is the most challenging task. In [5] we verified it in the 
restricted framework of symmetric constraints (for which, roughly speaking, the 
truth values 0 and 1 have the same weight). The general case presented here will 
require the use of a powerful combinatorial tool: supersaturated hypergraphs. We 
use an ordered version of a theorem of Erdos and Simonovits about supersaturated 
hypergraphs [9], which has already been used for obtaining sharp threshold results 
for Ramsey properties of random graphs by Priedgut and Krivelevich [11]. 

Let us consider /i-uniform hypergraphs. Let . . . , m) denote the fol- 

lowing generalization of the complete bipartite graph. Fix h disjoint sets of vertices 
Vi, . . . 14, where each Vi has m elements and take all those /i-tuples {x\, . . . ,Xh) 
such that for every z, G V^. We are considering graphs on the set of vertices 
{xi, . . . ,Xn}- We say that two disjoint subsets of vertices A and B verify A < B 
if for all Xi in A and all Xj in B we have i < j. Now we define an ordered copy of 

. . . , m) as an /i-uniform hypergraph defined as above with Vi < . . . < 14 . 
Thus, the ordered version of the theorem from Erdos and Simonovits about super- 
saturated uniform hypergraphs [9, Corollary 2, page 184] can be stated as follows. 

Theorem 4.2. [9] Given c > 0, there exists a c' > 0 such that if an h-uniform 
hypergraph over n vertices {xi,...,x„} has at least cn^ hyperedges, then it 
contains at least c'n^'^ ordered copies of L = (m, . . . , m) . 

This result enables us to prove the following: 

Lemma 4.3. If {PI) holds, so does (D2). 

Proof: 

For more readability we will perform the proof in the special case A: = 3, it 
will be clear that it is extendable to any k> 3. Recall that 

< ^2 < h) / h ^ t, g e 3^ such that 
s U C has no satisfying assignment with x\ = Si, . . . ,Xt = 
and that we have to prove that if (PI) holds, then for all 7 > 0: 
f^p,(n){s ^ Qs,if-As{s) > 7 ' ■n^) = o{l). 

It is sufficient to prove that for every s = 0 or 1, for every / € 9^ and all 
7'>0: 

IJ‘P,{n){s ^ Qs, •'"(s) > y ■ n^) = o(l), 

where 

= {C = {f,i\ <t<i 2 < is) such that sU C ^ Qs and Si^ = e} . 




514 



Nadia Creignou and Herve Daude 



The strategy will be as follows. First, for s ^ Qs, let us consider the following 
set: 



^< 5 ( 5 ) = {(Cl = {g,ii <i2 < h),C2 = (^^^4 < ^5 < h)) 
such that E T, s U {Ci, C2} E . 

We know that the probability that ^< 5 ( 5 ) is dense in the set of conjunctions of two 
clauses is negligible (see [5, Lemma 5.2]): 

For all V > 0,/ip^(„)(s ^ <35,#3a(s) > u ■ nP) = o(l). 

Therefore, in order to prove our lemma we will prove that for every 7 ' > 0, 
there exists some u > 0 such that: 



Mpc(n)(s ^ Qs,#A^/{s) > i ■ r?) < Mpe(n)(s i Qs,i^'^s{s) > V ■ n®), (5) 

thus proving the lemma. 

Hence, the trick is to provide a relationship between the cardinality of Al'^{s) 
and the one of ^^(s). First recall that T is interesting, that is for every e = 0, 1, 
there exists E T such that gs{l - e,l - - e) = 0. Moreover by assumption 

/ does not depend on one component, therefore there exists a^/3 E {0,1} such 
that f{e^a^(3) = 1. These two values a and /?, together with the corresponding 
functions g^. and g(^ will be of use later on. 

With {s) we associate a graph Gs{s): the set of vertices is {xt+i, . . . , Xn}, 
and for each constraint C = {f,ii < t < i 2 < is) E: A^'^{s) we create the edge 
{xi^^Xi^}, Consider in Gs{s) an ordered copy of the complete bipartite graph ^^ 3 , 3 , 
whose bipartition A = {x^^ , Xj ^ , = {xj^ , Xj ^ , Xj ^ } verifies A < B. Then, we 

claim that {{g^Ji < 32 < js), {9(3, 3 a < h < je)} E 'Bs(s). Indeed, in order to get 
a contradiction suppose that s' = sU {ga,ji < j 2 < js) ^ {9(3, 3 a < 3b < 3b) ^ Qd- 
Then, s' has a satisfying assignment I with xi = Ji, . . . ,x^ = St. By the choice 
oi ga, I assigns at least one of the literals out of {ooj^,Xj^,Xj^} to a, w.l.o.g let 
us suppose that I{xj^) = a. In the same way we can suppose that I{xj^) = j3. 
Thus, I satisfies the constraint f{e,Xj^,Xj^) since by assumption f{e,a,l3) = 1. 
This contradicts the fact that {xj^^^xj^} is an edge from Gs. Now, if Gs{s) is 
dense, then according to Theorem 4.2 (with h = 2 and m = 3) there exists p > 0 
such that Gs{s) contains at least p • ordered copies of the complete bipartite 
graph Ks^s. Therefore, the one-to-one correspondence we have established between 

the ordered copies of Ks^s in Gs{s) and !B< 5 (s) proves that if i^A^'^{s) > 7 ' • n^, 
then #® 5 (s) > p ’ for some p > 0. Therefore, we have proved (5), the desired 
inequality. 

The proof can be extended to any A; > 3. In the general case 'Bs{s) is formed 
with conjunctions of {k — 1 ) fc-clauses. The graph Gs{s) is then a (A: — l)-uniform 
hypergraph that contains at least 7 - 71 ^“^ hyperedges and from which every ordered 
copy of . . . , A:) provides an element of ^< 5 ( 5 ). 

Finally we have proved the following theorem, which completes our classifi- 
cation. 




Coarse and sharp transitions 



515 



Theorem 4.4. Let 7 be a multiset of constraint functions. If ^ does not contains 
any constraint function strongly depending on one component or on a 2XOR- 
relation, i.e., if both (PI) and (P2) hold, then SAT(J) exhibits a sharp threshold 
at scale n. 



5. Conclusion 

We have proved that generalized satisfiability problems provide a robust and chal- 
lenging framework to take a unified look at the relationship between complex- 
ity and probabilistic behavior. Besides capturing the usual satisfiability problems 
like fc-SAT, k-XOR-SAT, 1-in-fc-SAT and Not-AII-Equal-fc-SAT, this model -where 
y is a multi-set- also captures the problems (2-|-p)-SAT (see [14]), X0R2SAT, 
l-in-(2 -hp)-SAT and Not-AII-Equal-(2 -hp)-SAT (see [16]), which interpolate smoo- 
thly between P and NP by mixing together a polynomial and an NP-complete 
problems. On one hand our classification re-proves that (2 -f- p)-SAT exhibits a 
sharp threshold (see [1]) and shows that X0R2SAT, in which 3-clauses and 3- 
XOR-clauses are mixed, also does. On the other hand, for p < 1, it proves that 
l-in-(2 -fp)-SAT and Not-AII-Equal-(2 + p)-SAT have both a coarse threshold. 

Of course, the techniques developed here can be used to establish sharpness 
results for CSPs over finite domains (not only Boolean), see for example [6]. The 
classification of the nature of the transition for a large class of random CSPs over 
finite domains remains an interesting and challenging task. 



References 

[1] D. Achlioptas, L. Kirousis, E. Kranakis and D. Krizanc. Rigorous results for random 
(2+p)-SAT. Theoretical Computer Science, 265(1-2): 109-129, 2001 

[2] N. Alon and J. Spencer. The Probabilistic method. Wiley, New- York, 1992. 

[3] B. Bollobas. Random graphs. Academic Press, 1985. 

[4] N. Creignou and H. Daude. Generalized satisfiability problems: Minimal elements 
and Phase transitions. Theoretical Computer Science 302(1-3): 417-430, 2003. 

[5] N. Creignou and H. Daude. Combinatorial sharpness criterion and phase transition 
classification for random CSPs. In Proceedings of the 6th International Symposium 
on Theory and Applications of Satisfiability Testing, SAT’2003 Santa Margherita, 
pages 81-87, 2003. (To appear in a full version in Information and Computation). 

[6] N. Creignou, H. Daude and J. Franco. A sharp threshold for the Renamable Horn 
and the q-Horn properties. Submitted for publication, 2004. 

[7] N. Creignou, S. Khanna and M. Sudan. Complexity classifications of Boolean con- 
straint satisfaction problems. SIAM Monographs on discrete mathematics and appli- 
cations, 2001. 

[8] 0. Dubois, R. Monasson, B. Selman and R. Zecchina: Editorial. Theoretical Com- 
puter Science, 265(1-2), 2001. 

[9] P. Erdos and M. Simonovits. Supersaturated graphs and hypergraphs. Combinatorica 
3(2): 181-192, 1982 

[10] E. Priedgut and an appendix by J. Bourgain. Sharp thresholds of graph properties, 
and the k-sai problem. Journal of the A.M.S., 12(4): 1017-1054, 1999. 

[11] E. Priedgut and M. Krivelevich. Sharp thresholds for Ramsey properties, of random 
graphs. Random structures and algorithms, 17(1): 1-19, 2000. 

[12] S. Kirkpatrick and B. Selman. Critical behavior in the satisfiability of random 
Boolean expressions. Science, 264:1297-1301,1994. 




516 



Nadia Creignou and Herve Daude 



[13] M. Molloy. Models for random constraint satisfaction problems. SIAM Journal on 
Computing, 32(4):935-949, 2003. 

[14] R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman and L. Troyansky. 2+p-SAT: 
Relation of typical-case complexity to the nature of the phase transition. Random 
structures and algorithms, 15(3-4) :414-435, 1999. 

[15] T.J. Schaefer. The complexity of satisfiability problems. In Proceedings 10th STOC, 
San Diego (CA, USA), pages 216-226. Association for Computing Machinery, 1978. 

[16] T. Walsh. The interface between P and NP: COL, XOR, NAE, 1-in-k, and Horn 
SAT. In Proceedings of the Eighteenth national conference on Artificial intelligence, 
AAAI’ 2002 Edmonton, pages 695-700, 2002. 

Nadia Creignou 

LIE, UMR CNRS 6166, Universite de la Mediterranee, 163, avenue de Luminy, 13 

288 Marseille, Prance 

creignou@lif . univ-mrs . fr 

Herve Daude 

LATP, UMR CNRS 6632, Universite de Provence, 39, rue Joliot-Curie, 13 453 

Marseille, Prance 

daude@gypt is . univ-mrs . fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Stochastic Chemical Kinetics with Energy 
Parameters 

Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 



ABSTRACT: We introduce new models of energy redistribution in stochas- 
tic chemical kinetics with several molecule types and energy parameters. The main 
results concern the situations when there are product form measures. Using a prob- 
abilistic interpretation of the related Boltzmann equation, we find some invariant 
measures explicitly and we prove convergence to them. 



1. Introduction 

Metabolic pathways in molecular biology are chains or networks of chemical reac- 
tions providing redistribution of energy, in particular synthesis of ATP molecules, 
universal energy stocks in cells. Here we elaborate simple models of energy redis- 
tribution. According to a classical approximation, the energy of a molecule can be 
subdivided in two parts: internal (chemical) energy and kinetic energy. The model 
is the following. 

Assume that there are V molecule types v G {!,... ,V}, n^it) molecules for each 
type V at time t. Types v can be interpreted as chemical substances with different 
formulas, different isomers of the same formula, or even as different energy levels 
(spectrum) of the same molecule. 

The total number of molecules M = J]^n^(i) will be conserved. A molecule may 
be characterized by a pair {v,T),v = 1, . . . ,V, where T G is the kinetic energy 
of the molecule. Then each molecule of type v at time t has energy 

E{t) = I{v)+T{t), 

where I{v) is the internal (or chemical) energy of any molecule of type v, T{t) 
being the kinetic energy of a concrete molecule at time t. Thus, for any v,t, I{v) 
are fixed numbers and T{t) are random. 

We use the approach usually referred to as stochastic chemical kinetics. It ap- 
peared in physical papers, see [5], but was also explored also by mathematicians 
for many models with small V, (see e.g. the reviews [7, 4]). However these models 
did not consider any energy parameter. Independently of this, Kac [3] considered 
a beautiful model with mean field collisions. Deeper results in this model appear 
even recently, see [1]. However, in Kac’s model molecules were characterized only 
by kinetic energies, that is V = 1. Our model can be considered as a mixture of 
these two: there are molecule types and energy parameter. 

The plan of the paper is as follows. In section 2, we introduce our probabilistic 
microscopic model and provide the corresponding Boltzmann type equation. Proof 
of the finite microtime scaling limit convergence to this equation uses standard 
technical tools and will be published elsewhere. In section 3 we get deeper results 
for the one type case with uniform scattering: find invariant measures and prove 




518 



Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 



convergence of the Boltzmann equation for large macrotime. In section 4 we provide 
many examples, with similar results for multi-type models. 



2. Finite time scaling limit 

Unless otherwise stated, we consider a system of binary reactions of the form 
A-\-B C-\-D. We assume energy conservation and random momentary collisions, 

that is when a pair of different molecules (u,T), (u',T') collide at time t then a 
new pair (t»i, U), appears at time t -f- 0, so that 

I{v) + T + I{v') + r' = I{vi) + C/ + I{v[) + U'. 

Obviously, the reaction is possible only if 

I{v) + T + I{v') + T' > I{vi) + (1) 

We define the following continuous time Markov chain. The state is an array of V 
vectors {{v,Ti),i = l,...,n^),u G V. Thus, their total length M = Ylv=i'riv 
is conserved, but not necessarily riy. The order of components in each vector 
{{v^Ti),i = 1, . . . , n^) does not play any role, so that we will consider only func- 
tions symmetric in the vector coordinates. 

On the time interval dt), each pair of molecules (u,T), (u',T') has a colli- 

sion with probability -^ayyf{T^T')dt. The functions ayy>{x,y) are assumed to be 
bounded and smooth on R\. As a result of this collision, some pair (t;i, [/), (uj , [/') 
appears, provided that condition (1) holds for at least one pair Otherwise 

nothing occurs. The distribution of the new pair is defined by the rules listed 
hereafter. For any Vi^v[,Vyv' ^ the conditional densities 

P{{vuU),v[\{v,T),{v\T'))>Q 
are supposed to satisfy the following properties. 

(i) If 

I{v) +T + I{v') +r < I{vi) + I(v{), 

then 

F((vuC/),v'il(v,T),(v',T'))=0. 

(ii) For any v,v' ,vi,v[,T,T' , the density function 

f{U) = P{{vr,U),v[\(v,T),{v',T')) 
is defined on the interval I = [O^Iy -\-T lyf -h T' — I(vi) — I{v[)] and 

Y, J P{{vi,U),v[\{v,T),{v'X))dU = l. 

Vi,v[ ^ 

Thus the distribution of the triple {vi, U,v[) is entirely defined by 

P(fyi,C/),ul|(u,r),(u',T')) 

and U' = I{v) + r -f 7(u') + T' - (7(ui) + C/ + I{v[)). 

Hence, for V finite sets . . . , } , i; = 1, . . . , V, we have defined a Markov 

process on 7?^, which will be denoted by It is worth remarking that, when 
the total energy U is fixed, -Cm has a compact state space. Then, under some 
nondegeneracy conditions on a and P, this Markov chain for fixed M approaches, 
as t oo, its unique stationary distribution Our goal will be to study. 




Stochastic chemical kinetics 



519 



under some conditions, the scaling limit M oo for fixed and also the large 
time limit t ^ oo. 

Let denote the number of type v molecules at time t having kinetic 

energy T in the set A C In the limit M ^ oo we have to impose initial 
conditions at time zero 

ni^\A,0) f ^ 

J™ ^ = / Pv{x,0)dx, 

M—^OO IVl J 

for some nonnegative functions Pv(x,0),Xl^ py{x,0)dx — 1, called concentra- 
tions. Our goal is to prove that, as M oo, the sequence of Markov processes 
£jm converges to some deterministic evolution L of the concentrations. We state 
now our first result. 



Theorem 2.1. For any A and t, there exist deterministic limits (in probability) 






= / py{x,t)dx, 

Ja 



where the py{x,t) ^s are some non-negative functions satisfying the following Boltz- 
mann type equations 

( 2 ) 

- (a;, z)P{{v, y) , i;'l (vi , x) , {v[ , z))p^^ {x, t)p^'^ {z, i)] dydz, 

with the initial condition py{x,0). 



Other reaction types Quite similarly one can consider other types of reactions. 
For example consider the reaction A ^ B + C. In this case on the time interval 
{t,t + dt) each molecule (u,T) with probability ay{T)dt is transformed to two 
molecules (note that the scaling is different here). The distribution of the prod- 
ucts {vi,U),{v[,U^) is defined by similar kernels P{{vi^U)^v[\{v,T)) under the 
condition 

Ivi + Iy[ ^ ly -\-T. 



3. One type case 

3.1. Probabilistic interpretation 

We consider in this section the particular situation with only one molecule type 
V. It will be also assumed that the rates a(T,T') = ayy{T,T') = a and the condi- 
tional probabilities P{U\T,T') are uniform on the interval [0,T -h T']. It turns out 
that the limiting stationary distribution can be found explicitly. Indeed, equation 
(2) can be rewritten as 

= ^ p{u,t)p{s-u,t)du-ap{x,t)- (3) 

[Similar equations appeared in [2] in a different context]. Now one can guess a 
fixed point: it is p{x) = but it also can be obtained from a very clear 

probabilistic picture. 

Let us consider particle dynamics^ that is the chain Lm, the states of which 
are finite subsets of i?+ with M elements. 




520 



Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 



Take first the case M = 2. Define the chains J^ 2 {U) as the restriction of £2 on 
states with total energy U. Then the chains H> 2 [U) are irreducible and nilpotent: 
that is, already after the first jump we get the stationary distribution 7T2([/), with 
T uniformly distributed on [0, U] and T' = U — T. Hence, for any initial condition, 
£2 is a mixture of J^ 2 (U). We see that, for any density f{U), the measure 

[ 'K2{U)f{U)dU 

JR+ 

is an invariant measure for £2- Indeed one of these invariant measures is of greatest 
interest to us. Let the random vector (^1,^2) on be defined by the measure 

li 2 ,[ 3 , such that the two random variables ^1, ^2 on be i.i.d with density p{x) = 

/3exp(— /3x). Consider a new random vector (771,7/2), where 7/1 is picked at random 
on the interval [0, + ^2] and 7/2 = + ^2 — This defines a transformation of 

measures P 2,(3 — In fact we have the following 

Lemma 3.1. The measure p 2,/3 is invariant with respect to W , that is 

T2,(3 — T2,j3> (4) 



Proof: Immediate, since the density of ^ +^2 is P^xexp{—l3x). Then picking 

a random point on the interval [0, x] yields 7/2,^? whence equality (4) follows. 

In addition, (4) gives 

dx r 

p{u)= - p{y)p(x-y)dy^ 

Ju ^ Jo 

which is exactly the stationary form of equation (3). 

For M > 3, the Markov chain <Cm has also irreducible components con- 

sisting of all states (Ti, . . . ,Tm) with Ti -f- • • • + Tm = U. For fixed M and U 
the invariant measure of the chain J^m{U) is the uniform measure on the simplex 
Ti -h • • • + Tm = U. An invariant measure on Lm can be found as follows. 

Take M independent particles, having each density on and let 

denote their joint distribution. 



Lemma 3.2. The measure PM,p is invariant for Lm- 

Proof: It follows from the previous lemma, because the generator of Lm is the 

sum of generators corresponding to all pairs {i,j)^i,j = 1, . . . , M, i 7^ j. 

Remark 3.1. One can show that Cjm{U) is reversible, by using the classical Kol- 
mogorov’s reversibility criteria for Markov with transition rates X^p, namely 

'^Q!iQ;2'^a2Q!3 • • • '^Q'fcCKfc-i • • • 

See related questions in [8]. 



3.2. Convergence for Boltzmann equation 

According to the above section, we have 

n[P\A,t) 



lim lim 

M— »-oo t—^00 



M 



= f 

JA 



We will consider now the quantity limt_>oo limM^oo- 

Theorem 3.3. For Boltzmann equation (2), for any initial condition p{x,0), we 
have 



lim p{x,t) = (3e x > 0 



(5) 




Stochastic chemical kinetics 



521 



Proof: The sketch is the following. First, we prove in the next subsection, under 
more general assumptions, that any initial distribution converges to some fixed 
point. Secondly, we will show that there is only one-dimensional manifold of fixed 
points, namely < /3 < oo. This will conclude the proof, since /3 itself is 

uniquely determined by the initial mean energy 



T(0) = 



M->oo M 



M 



i=l 



1 



3.3. Local equilibrium condition 

We come back here to an arbitrary number of types. We will say that a posi- 
tive function f{v^x) on F x with f f{v,x)dx = (7 < oo, satisfies a local 
equilibrium condition (LE) if, for any 7 , 71 , 

^(u;(7,7i|7',7j)/(y)/(7i) -^7',7il7.7i)/(7)/(7i)) = 0, (6) 

I'n'l 

where we use the notation 

7 =(^^,a;), = dx, 

7 V 

and 



«^(7,7i|7',7i) = 

ay>y'^{x',x[)P{{v,x),vi\(v',x'),{v[,x[))S{xi-{x'+x\+Iy'+Iy>^ -x-ly-lyj). 

One can assume (7=1. Then, in the one type case, this is tantamount to saying 
that £2 has the invariant product form distribution f{x)f{y). 

The fixed point condition (FP) 

(w(7,7i|Y,7j)/(7')/(7i)-^7',7il7,7i)/(7)/(7i)) = 0 , (7) 

valid for any 7 , follows immediately from ( 6 ). 

We shall say that /(y) satisfies a detailed balance condition (DB) whenever 

w^(7.7il7',70/(7')/(7i) - w(7'.7il7,7i)/(7)/(7i) = 0, (8) 

for any 7 , 7 ', 7 i, 7 (. In the above one type example, DB condition holds if one 
chooses 

/o = 

for any positive (3. Note that DB^LE^FP. 

Let us define the relative entropy of / with respect to /o, assuming both / and /o 
are positive. Farther on, /o will be fixed and therefore omitted in the notation, so 
that 

H{f) = H{f, /o) = ^ /(7) log (9) 



Theorem 3.4. Assume that there exists some fo{'y) > 0 satisfying the local equi- 
librium condition. Then for any initial /(7,0) with i7(/(.,0)) finite, the function 
/(t) = is ihe solution of equation (2), does satisfy 



dH{f) 



> 0 . 



dt 




522 



Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 



Moreover, ast oo, /(7,t) tends, to some fixed point foo which depends in general 
on the initial data /( 7 , 0). LE condition holds for any stationary solution f, that 
is for any fixed point of (2). 



Proof: The integrability of follows from (2), so that the following conserva- 



tion law holds 



E 



df{l) 

dt 



0 . 



( 10 ) 



Differentiating (9) and using (10), we get 
dH{f) ^ df i-y) 



dt 



= E' 



dt 



log 



/o(7) 

L fil) J 



We rewrite condition (6) as 



E 






/o(70/o(7i) 

foil) Mil) 



^(7',7il7,7i). 



and set for the sake of shortness f = y . Then, for any function /(7), we 

have 

y«;(7,7il7',7()^^^^^^^^/(7)/(7i) = y w;(y,7ll7,7i)/(7)/(7i), 
or, after a change of variables, 

j w{l',li\l,li)fil)fili) = f Ml\ I'l l 7 , 7 i) /(V)/( 7 l )• 

Let (fi-f) = log [t^] • Then 



dHjf) 

dt 



E 



dfil) 

dt 



vil) 



f ^{l)[wil>li\l' n'i)f{l')f{l'i) - w{l\ll\l^ll)f{l)fill)] 
f M 7 ) -'^( 7 ')]w'( 7 , 7 il 7 ', 70 /( 7 ')/( 7 i) 

\ f [<p{l) + vili)-vil')-<Pili)]Ml^li\l':li)fil')f{li)- 



Set for a while 



so that 
Then 



Ml) Mil) fii') fill) 

I fii) fill) Ml') Mil)’ 

I 

I Mi)Mii) ’ 

log^ = <^( 7 ) + ¥^( 71 ) - I’ll') - <p(l'i)- 
= I y'a^log^w(7,7il7',7j)/(7)/(7i)- 




Stochastic chemical kinetics 



523 



On the other hand, from the LE condition, 

j a^w(7,7il7',7i)/(7)/(7i) = y au;(7,7i|7',7l)/(7)/(7i), 

which yields 

~^p- = \j (^log^-^+l)«K7,7i|y,7i)/(7)/(7i) >0, 

since ^ log ^ ^ + 1 > 0 if ^ > 0, due to the convexity of ^ log 

Assume now that for some /o > 0 the local equilibrium condition holds. Then it 

holds also for any other stationary solution /, i.e. satisfying ^ = 0. In fact, note 

that > 0 if /(7)/(7i) > 0,«’(7,7i|y.7i) > 0 and ^ 7^ 1. Also, if / is a 

stationary solution of equation (2) then = 0 and hence = 0. It follows 
that for any 7,7i,7',7i such that /(7)/(7i) > 0 and '^^(7,7l|7^7^) > 0 we have 
^ = 1, that is 

/(Y)/(70 ^ /o(70/o(7l) 

/(7)/(7i) /o(7)/o(7i) 

On the other hand, if — 0,f{'j) — 0 and ry(7,7i|7',7j) = 0, then we get 
= 0 as a consequence of equation (2). Thus, for any 7,71, equation (6) 
holds. Any solution f{t) of the equation (2) as ^ ^ 00 tends to some stationary 
solution /oo, which depends in general on the initial data /(O). In fact, from the 
proof of theorem 2.1, it follows that / is a stationary solution, i.e. ^ = 0, if and 

only if = 0 (provided that (2) holds). This means that H{f) is a Lyapunov 
function. Consequently, the expected result follows from the general theory of 
Lyapunov functions and the proof of the theorem is terminated. 



3.4. Fixed points and conservation laws 



Now we will prove that, for any two fixed points /o,/, the function log 
additive conservation law. Consider the equation 

/(f)/(7;) ^ /o(y)/o(7i) 

/(7)/(7i) /o(7)/o(7i) ’ 



is an 



( 11 ) 



For /o = 1, we have 



/(7')/(7i) 



( 12 ) 



/(7)/(7i) 

which shows that log / is an additive conservation law. Vice versa, if there is a set 
J of additive conservation laws such that 



’nAi) + 'nAii) = Vjii') + »?j(70 > j e J, 

then, for any constants c, Cj, 

/(7) = cJJexp(cj7?j(7)) 
jeJ 

is a solution of (12). Note that additive conservation laws form a linear space. 
Thus we have proved that any solution of (12) has this form. In the general case 
(that is if /o ^ 1), we have 

J = c JJ exp(c^r/j(7)). 
j€J 




524 



Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 



It is worth noticing that a nonzero additive conservation law for the chain /Cm is 
in fact unique, if the chains H>m{U) are irreducible, for all U. 



4. Invariant measures for multi-type models 

Here we will analyze some cases with V > 1, when there exists an invariant measure 
having a product form. 

4.1. Binary reactions without type change 

Let for any t; = 1, . . . , V a density py{x) > 0 on Rj^ be given. Assume only reactions 
v^w v^w are possible, so that the n-^’s are conserved. Then one can introduce 
finite particle Markov chains Suppose in addition that, for any couple 

of types (u, w), 

avw{T^ T') = ayyj{T + T^), 

which means that the rates depend only on the sum of energies. 

We need the following definition. Fix a pair (u, w) of types and let ^y,^w be inde- 
pendent random variables with joint density Py{x)pyj{y). Denote 

^Pvpw ~ ^pvpwi^^vl'^) ~ = X')^W — y\^V = T) 

the corresponding conditional distributions, which will be called canonical kernels 
corresponding to the density array (pi, . . . , py). 

Let ^y^i^i — 1, . . . , n<y, stand for the energy of the i-th particle of type v. 

Theorem 4.1. Fix an array pi, . . . ,py and let a system of reactions with 

canonical kernels Pp^p^ be given. Then, for any rzi, . . . , ny, the invariant measures 
of ore such that the random variables ^y^i have independent distributions 

equal to py . In the thermodynamic limit, for any initial concentrations of types 
(ci, . . , ,cy) (here the concentrations of types do not change at all), the invariant 
energy distribution is unique and given by the independent densities py . Moreover, 
for any initial energy distribution, there is convergence to this invariant measure. 
Also, for any array (pi,...,py) with arbitrary rates OLyyjiU), there is only one 
system of kernels for which this array defines an invariant (product form) distri- 
bution, these kernels being canonical kernels. 

Proof: Any transition v,v' v,v' conserves U and the related measures. Hence, 

as for the convergence, the argument is similar to that in the previous section. The 
other statements follow directly from the definitions. 

When py{x) = the kernels are uniform on [0,T], as in the one type 

case study. Let denote such a kernel. An interesting situation depicted in the 
next remark arises when 

py{x) = /?„exp(-/?^,x), 

with different /?^,’s. 

Remark 4.1. All other cases can be reduced to the simplest one by the following 
transformation. Given any density p > 0 and any /? > 0, introduce the one-to-one 
transformation U — U{p,p) : such that, for any x G R^, 

px pUx 

/ p{y)dy= / j3e~f^ydy. 

Jo Jo 

Then 

P,^^{U-\p,,(5),U-\pn,,(d))P^{U{p,J),U{p^,l3)). 




Stochastic chemical kinetics 



525 



4.2. Unary reactions 

Now we want to tackle examples in which the n^’s are not conserved. Then, in 
general, only 

^ ~ |-nv=M'^ni ,...,nv 

is a Markov chain. In this subsection, we assume that unary reactions 

V ^ w 

can take place with rates ay^. Such reactions could be interpreted as isomer to 
isomer transformations. In case ly > lyj the reaction v ^ w always occurs, and 
kinetic energy T of u-particle becomes kinetic energy ly — Iyj^Toi ro-particle. The 
reaction w v however occurs only if T — ly lyj >0, and in this case T kinetic 
energy T of u;-particle becomes the kinetic energy T — ly + lyj of the 7 ;-particle. 
Consider first the case without binary reactions. Define the following one-particle 
Markov chain: its states are all pairs (v,T), that is M = 1. Moreover, assume 
that there are only two types. Let Ii < I 2 . Consider a pair of densities pi,P 2 ? 
denote ^ 1,^2 corresponding random variables. We call this pair admissible if the 
conditional density of — (/2 — -fi), on the event {^1 > I 2 — h}, is equal to p 2 . 
One example is pi = p 2 = ^exp(— ^x), another being 

fO, for X < I 2 - h] 

PiW = \ 

[ P 2 (^ — ^2 + A ) 5 otherwise. 

Any invariant measure on { 1 , 2 } x can be written as 7 Ti( 1 , pi) + 7 T 2 ( 2 , P 2 ) with 
positive coefficients tt* such that tti + 7 T 2 = 1. We have for 7 ri, 7 T 2 the following 
equations 

/*oo 

niYiai2=7T2a2i, Yi= pi{x)dx. 

J12-I1 

This case exhibits the highest degree of reducibility, each class containing one or 
two elements: there is plenty of invariant measures - but this is clearly a very 
unnatural situation. For an arbitrary M with only two types, we have the product 
of M chains which again leads to a rather unnatural situation. 

When there are V > 2 types, each class also has a finite number of elements. It is 
then possible to order the internal energies, assuming for example 

A < ^2 < • • • < 

and also Oyyj > 0, V^;, w. If the full energy satisfies Im < U < Im-\-i,rn = 1, . . . , F 
(putting /rn+i = 00 ) then there are no possible jumps to the types m -f 1, . . . , F, 
so that the process evolves as a Markov chain with state space l,...,m 
and rates ayyj^v^w = 1, . . . ,m. Hence J^i^m ^^re restrictions of For m = 1, 
it becomes a trivial one-point Markov chain. Let 7Tm,v^'^ = 1, . . . denote the 
stationary probability of the state in We have tti^i = 1. 

Note that, if at time 0 the state is (1, U) and U has some density f(l7) in [7^, /m+i]? 
then the stationary distribution is defined by 7Tm,v and by the conditional density 
/ of the full energy. Thus everything is defined by the rates Oyyj and by /(J7), that 
is pi. Moreover, these quantities can be chosen arbitrarily. Setting for the sake of 
shortness 

we propose hereafter some examples. 

Shifts In this first example we take pi (x) ^ 0 if x < ly — Ii. Then each py is just 
a shift of Pi . 




526 



Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 



Reversibility Analogously, a system (pi , . . . , py ) of densities will be said admissible 
if the following condition holds: for any v the pair (p^,p^ 4 -i) is admissible. Then 
it follows that each pair of densities (pi^pj)^i < j, is admissible. 

Theorem 4.2. If Ii <...< ly, all pv{x) are strictly positive and the system 
(pi, . . . ,py) is admissible then is reversible. 



Proof: Let fv{U) — py{U — ly) for U > ly and fy{U) — 0 for < ly We suppose 

the invariant distribution for the chain has a product form, each factor being 
given hy 'Kyfy{U). This means that for each m and for Im^U < /m+i 

m m 

i=l i=l 

for j = 1, . . . , m. Then admissibility means 



MU) = 



AifiiU), for U>Ii, 
fi{U) = 0, otherwise. 



Hence 

m m 

^ — TT j Aj ^ ^ aji^ ^ ^ j — m. 

i=l i=l 

Putting Pi = 'KiAi, it follows that 

m m 

=Pj'Yh ^ i ^ Vm = 1, . . . , K 

i=l i=l 

The comparison of these equations for m and m + 1 yields 



By induction this yields 

Pi ^ij Pj i ? 5 j 5 

which implies the announced reversibility of 

Exponent In this third example, we also assume the system (pi, . . . , py) of densi- 
ties is admissible, and moreover that, for some p(T) and all 



Pv — P* 

Theorem 4.3. Suppose V > 3, and that the quantities I 2 — h and Is ~ h 
incommensurable. Then 

p{T) = (3exp{-0T), 

for some (3 > 0. 

Proof: Admissibility implies that 

P2(T) = A2pi{T 12 — h), 

PsiT) = Aspi{T-{-Is — Ii)- 

If pi = P2 = p 3 = P then 

p(T) = A2 p{T + X 2 ) = Asp{T -h xs)^ 

where Xi = li — I\, i = 2,3 are incommensurable. But these last two equations 
are compatible only if A^^ = A^^ and p(T) = ^exp(— /?T), with 

^ ^ log A 2 - log A 3 



X2 - OCs 




Stochastic chemical kinetics 



527 



Energy dependence In the fourth example, the rates ay^ = ayyj{T) depend on the 
energy of the input particle v. To construct a model which will be needed later, 
consider a reversible Markov chain Vi on {1, ... , V} with stationary probabilities 
Py and rates byyj. Thus 

Pvb 

vw = Pwb wv 

For reactions v define the reaction rates as 

fO, if U < lyj, 

[([/ - Iw)^'^byyj, otherwise. 

Note that these rates are close to zero if the kinetic energy Tyj = U — 1^ of the 
u;-particle is close to zero. Letting fy{U) be the density of the full energy of the 
t7-particle, the reversibility condition writes 

7Tyfy{U)ayyj{U) = 7Ty,fyj{U)ayjy{U), (l3) 

for U > meix{Iy, lyj). We take as density / the shifted T-distribution 



fv{u) = I 

(O, otherwise. 

Here + 1. Then equation (13) becomes 



{U - exp[-/3([/ - /„)], if Cl > 4, 



(14) 



-p,/ \ ^vw 



r(i^u,) 






showing that the stationary probabilities TTy of type v are equal to (up to a common 
factor) 

(15) 

and the resulting Markov chain is reversible. 



4.3. Binary reactions without energy dependence 

Let us suppose that ayyj do not depend on energies, so that types evolve indepen- 
dently of the energies. Thus at any time t, we will have probabilities pt{rii, . . . ,nv)> 
We will look for cases when there exists an invariant measure on each de- 
fined by probabilities p(ni, . . . ,ny), and independent conditional distribution of 
energies 

J[ Pv^i’) 



[given m, . . . ,n\/], defined by densities py.iix) — Pv{x), 

Assume all 7^’s are equal, but any reaction v^w can occur and let now a 

reaction v,w ^ v' he given. We again call 

^Pv'Pw' ^1^) 

the canonical kernel corresponding to the reaction v^w ^ v\w' and we denote by 
pyw{T) the density of ^y + ^w 

Theorem 4.4. Suppose an array (pi,...,/9y) of densities is given, satisfying for 
any binary reaction v,w v' ,w’ the conditions 

Pvw{T)= / py{x)pyj{y)dxdy = / py\x)pyjf{y)dxdy = py>yj>{T). 

J x-\-y—T J x-\-y=T 




528 



Guy Fayolle, Vadim Malyshev, and Serguei Pirogov 



Assume also canonical kernels and that, as t oo, the limit of pt{ni, . . . ,ny) 
exists. Then there is an invariant measure having these densities. 

4.4. General binary reaction case 

Here the ly's can be different, but we assume only binary reactions v,w v' ,w' 
are possible. 



4.4.1. Complete factorization Denote i a pair of types {v, w). Thus reaction 
v,w v',w' will be written as i j, where i = {v,w),j = {v' ,w'). We shall 
use the analog of the third example with binary reactions. Consider a Markov 
chain Vi x Vi on {1, . . . , V} x {1, . . . , V} with rates bij, such that its stationary 
distribution be a product form = PvPw and the chain be reversible. We 

define, for vector particles i = {v, w), the energies li = ly ly, and 

MU) = {fv*U){u), 

where fy,fw are given in (14). Thus fi{U) has also a shifted T-distribution with 
parameters li = ly -\- ly,, Ui = Vy -\- Uy,, (3. The reversibility condition, with some 
unspecified stationary probabilities tt*, writes 

7TiMU)aij{U) = 7Tjf0)aj,{U), (16) 

where U > max{Ii,Ij). Letting 






0 , 



iiU < Ij ; 
otherwise. 



Here 

Ckj — j 1 — I5 

and the reversibility condition becomes 









r(^,) 






We are looking for solutions = TryTTy,, since we are primarily interested in 

factorizable invariant distributions. To this end, we assume in addition that, for 
any binary reaction v,w ^ v',w', the condition 



Pi; — Pv' Pil;' 

is fulfilled. Then 

TTyTTyjP''^^'"^ e^^^ e^^'^ hij = TTyrTTyj^ hji, 

and up to a common factor, the solution of this system has the form 

7Ty = PyC~^^^ 



(17) 



4.4.2. Unary reactions included Let V = UaVa be a disjoint union of sets 
Va of isomers. Thus, we assume that unary reactions v w axe allowed only if 
V and w belong to the same V^. The energy dependence of unary reactions will 
be defined in the same way as in section 4.2, but additionally we take Uy to be 
constant on each Va, in other words Uy = Vy, for any two isomers v,w G Va. 

We consider the same binary reactions as in section 4.4.1, with the assumption 
that they are concordant with unary reactions in the following sense: P[y^w) == PvPw 
are such that, for any a, the probabilities py have the form given in section 4.2 up 
to a constant factor. 




Stochastic chemical kinetics 



529 



Theorem 4,5. If the previous conditions are fulfilled, then formula (17) gives the 
factorized reversible invariant distribution, both for binary and unary reactions. 

Proof: It suffices to compare the formulae (17) and (15), remarking that the 
factor T{uy) in (15) can be omitted, since Uy is constant on Va- 

References 

[1] E. Carlen, M. Carvalho and M. Loss, Determination of the spectral gap for Kac’s 
master equation and related stochastic evolutions (2001), Preprint. 

[2] M. Ernst, In Nonequilibrium Phenomena I. The Boltzmann Equation, North Hol- 
land, 1983. 

[3] M. Kac, Probability and Related Topics in Physical Sciences, Interscience Publishers, 
1958. 

[4] A. Kalinkin Markov branching processes with interaction, Russian Math. Reviews, 
vol. 57, No. 2 (2002), pp. 23-84. 

[5] M. A. Leontovich, Main equations of kinetical theory of gases from the random 
processes point of view, J. of Experim. and Theor. Physics, vol. 5, No. 3-4 (1935), 
pp. 211-231. 

[6] V. Malyshev, S. Pirogov and A. Rybko, Random walks and chemical networks, 
to appear in Moscow Math. J. 

[7] D. McQuarrie, Stochastic approach to chemical kinetics, J. Appl. Prob., 4 (1967), 
pp. 413-478. 

[8] P. Whittle, Systems in Stochastic Equilibrium, John Wiley. 1986. 

Guy FayoUe 

INRIA Rocquencourt - Domaine de Voluceau BP 105 
78153 Le Chesnay, Prance. Guy.Fayolle@inria.fr 

Vadim Malyshev 

INRIA Rocquencourt - Domaine de Voluceau BP 105 
78153 Le Chesnay, Prance. Vadim.Malyshev@inria.fr 

Serguei Pirogov 

IPPI - Russian Academy of Sciences 
19 Bolshoi Karetny - 101447 Moscow, Russia. 

Work partially supported by RFBR grant 02-01-01276. 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Large Deviations of Hellinger Distance 
on Partitions 

Laszlo Gyorfi 

ABSTRACT: We discuss Chernoff-type large deviation properties of the 

Hellinger distance on partitions. If Hn denotes the Hellinger distance of the em- 
pirical distribution and the distribution restricted to a partition then for small 
e > 0, P{Hji > e} ^ ^-n(e +o(i))^ where n is the sample size. 



1. Introduction 

We consider the problem of testing an unknown probability density function. The 
test statistics are derived from dissimilarity measures of probability measures, like 
(^-divergences introduced by Csiszar (1967). The most important (/^-divergences in 
mathematical statistics and information theory are the total variation distance, 
the information divergence and the x^-divergence. In this paper we consider the 
Hellinger distance. 

Let (p : (0, oo) R he a convex function, extended on [0, oo) by the continu- 
ity. If p and 1 / are probability measures on with densities / and g with respect 
to a dominating measure A, then the ip -divergence of p and u is defined by 

We can consider some goodness of fit tests for the hypotheses that the 
unknown distribution is p. Suppose that p is non- atomic. Assume a sample of 
independent random vectors distributed according to a probability 

measure p, and let pn denote the empirical measure. Based on a finite partition 
7ri = • • • ) ^n,mn} introduce a test statistics comparing p and pn‘- 

Usually, the hypotheses is accepted if pn) is small. In order to character- 

ize such a test either the limit distribution of D^;p^{p,pn) should be known, or 
its large deviation property. 

For 

^{t) = |1 -ii 

the (/^-divergence is called the L\ error: 

11/ - 5ll — J \f{x) - 9 ix)\\{dx). 




532 



Laszlo Gyorfi 



Gyorfi and van der Meulen (1991) introduced the corresponding test statistic 

rrin 

Beirlant et al (2001) proved that if 





lim max/Lz(An 7 ) = 0 

n — >•00 j ’ 


(3) 


and 


lim 

n—^oo n 


(4) 


then for all 0 < e 


< 2 






lim -logP{L„ > e} = 

n—^oo n 


(5) 


where 


gif)= inf D(p||p + e/2), 

0<p<l-e/2 


(6) 


and 


r>(a|i/?) =alog ^ +(1 o;)log^_^. 


(7) 


It means that 


P{I„ > e} = 




For 







(p{t) = -logt 

the (^-divergence is denoted by /(//, u) and is called the reversed order information 
divergence (also called I-divergence, Kullback-Leibler number, relative entropy). 
The corresponding statistics is called modified likelihood ratio statistics: 

rrin /A \ 

Kallenberg (1985), and Quine and Robinson (1985) showed that under (3) and (4), 
for all e > 0 

P{7„>e} = e-"(^+°(i». (8) 



2. Bellinger distance 

For 

ip{t) = {Vi -if 

the (/^-divergence is called the squared Hellinger distance, thus 
H‘^{li,v) = H‘^{f,g) 



1 g{x)\{dx) 



- L(> 

= (v7w - VvV)) Hdx) 



a{x) 




Large deviations of Hellinger distance 



533 



and the corresponding squared Hellinger statistic is defined as 

2 



2 - \jnn{An,j^ ■ 



Theorem 2.1. Assume (3) and (4), and that for all 0 < e < \/2 the rate function 



lim - logP{Hn > e} = 

n—^oo u 



( 9 ) 



exists. Then 

and 

where 



< gnie) 



9*{e) 



inf D{p\\q). 

0<p<q<l,H{{p,l-p),{q,l-q))>e 



( 10 ) 

( 11 ) 

(12) 



Remark 1. According to Theorem 1 

< 3h(c) < 5*(e)- 

We may have an upper bound on g*{e) 

g*(e) = inf D(p\\q) < inf 

0<p<q<l,H{{p,l—p),{q,l—q))>e 0<q<l,H‘^ {{0,1) ,{q,l — q))>e‘^ 



D{0\\q). 



Because of 

H\{0,l),{q,l-q)) = 2-2^/r^ 

and 

D(0||g) = -log(l -g), 
the minimizer q = = 1 — {1 — e^/2)^. Thus 

9*{e) < -log(l - qO = -21og(l - e‘^/2). 

For 0<x<l/2we have that — log(l — x) < x -\- therefore for 0 < e < 1 
g*{e) < -21og(l - e^/2) < + e"^/2. 

This bound and (10) imply that for small e 

gnie) e^. 

It means that, for small e, 

P{Hn > e} « 



Remark 2. The Hellinger distance is closely related to the Li error (cf. Devroye, 
Gyorfi (1985)): 






'if, 9) = - t/ffW) Kdx) 

= [ \f{x) - 9{x)\X{dx) 

Jm<‘ 



\\f~9\\ 




534 



Laszlo Gyorfi 



and 

\\f~gf < {\/f{x) - \/^) Kdx) Kdx) 

therefore 

Hl<Ln< 

These bounds imply that 

5 l ( c ^) < gni^) < gL(e\/ 4 -£ 2 ). 



For small 6, 

5 L(e) « e^/ 2 , 

therefore the bounds (13) imply that for small e 

e^/2 < < 2e^. 

By numerical calculations, one can verify that for 0 < e < 1.26 the lower bound in 
(13) is weaker than (10). Next we show that, in general, the upper bound in (13) 
is weaker than (11), i.e. 

5*(e) < fl'/-(e\/4-e2). 



In order to see this put 

F(e) = e\/4 - e2. 

Then for e < \/2 the function F{e) is monotonically increasing, therefore 

g*{e) = inf d^{p\\g) 

0<p<9<l,i/((p,l-p),(g,l-q))>e 

= inf D{p\\q) 

0<p<q<l,F{H{{p,l-p),(q,l-q)))>F{e) 

< inf D{p\\q) 

0<p<q<l,Li((p,l-p),(q,l-q))>F(e) 



= gL{F{e)) 

= gii^y/i- e^)- 



Remark 3. The above result obviously can be generalized to a large deviation 
result for the Bellinger inaccuracy rate H{f, fn) of the histogram density estima- 
tor fn based on a sample of independent random vectors Xi, . . . , Xn, distributed 
according to a probability measure fj, with density / with respect to the Lebesgue 
measure A. 

Introducing a partition 7n = {^njS j > 1} t>f such that sup ^>2 K^n,j) < 
then the histogram density estimator is defined by 



fn{x) 



Mn(''4n(a;)) 

A(A„(x)) ’ 



where An{x) = An^i if x G ^n,i- Mimicking the proof of Corollary 1 in Beirlant et 
al (2001) we can get the following: 

Assume that for each sphere S centered at the origin 



lim sup diam(Anj) = 0, 

An,jn5#0 




Large deviations of Hellinger distance 



535 



and there exists a sequence of spheres Sn centered at the origin such that Sn T 
and 

lose Tl 

lim card{yl„ ,■ n ^ 0} = 0. 

n^oo ’ U 

Then for all 0 < e < \/2 

lim -logP{i/(/,/„) > e} = (14) 

n— >oo n 

where gni^) is defined by (9). 

Note that in this statement there is no condition on /, and so (14) holds for all /, 
and the rate function gni^) does not depend on /. 



3. Proof 

We derive the lower bound using the reversed order information divergence. The 
squared Hellinger distance can be written as 

(\//(a;) - Hdx) 

= ^^2 - ^yf{x)g{x)^ \{dx) 

= D^-{n,v), 

where 

= 2(1 - Vi). 

The reversed order information divergence was defined by ip{t) = — log^, therefore 
because of the inequality 

2(1 - Vi) < -logt, 

we have that 

< I{n,v), 

which implies that 

Hl<In 

and together with (8) we have (10). 

Let fin and be the restrictions of and fin to the partition In the 
proof of (11) we shall use the following lemma. 

Lemma 3.1. (Sanov (1957), see p. 16 in Dembo, Zeitouni (1992)). Let E be a finite 
set (alphabet), Ln be a set of types (possible empirical distributions) on E, and let 
T be a set of distributions on E. Then 



-logP{/x; er}+ inf I{T,fln) 

n rernCn 



< 



|E|log(n + l) 



n 



( 15 ) 



where jEj denotes the cardinality ofTi. 

We apply (15) for 

^ ~ {-^n,l j • • • 5 ^n,mn } 



such that 



Then, according to (15), 



r = {r : H{fin,r) > e}. 



- log P{Hn >e}+ inf /(r, pn) 
n TGrnxi< 7 T, 



< 



m„log(n + 1) 



n 




536 



L^zlo Gy5rfi 



and therefore, under (4), 



1 



~9 h{^)= lim - logP{if„ > e} = - lim inf I{T,^in). 

n— >oo n n-^oo rernil/rr 

The distributions in Hn are possible empirical distributions, having components 
of the form where r is integer. Because of (3) we have that 

rrin oo, 

therefore the continuity of JT(r, and implies that 



Here 

Put 

and 



gH{e)= lim inf I{T,fi„)= lim inf I{T,Hn). 

n^oo rernLn n-^oo 



/A \ 

L = {j • j)} 



(16) 



An — ^j^LAnJ' 

In order to get an upper bound on consider a subclass of the distributions 

such that is constant both on L and L^. Then for such r, 

£ - yjT{An,j^ 

nar \ '' jeL^ ^ 



jeL 



— H” l^{An) V^l T^An)^ 

= {{r{An), 1 - r{An)), (/i^(An), 1 - /^(An))) 



and 



^r(Anj)log 

jeL 



t(A, 



njj 






+ XI log 

jeL^ 






1 - r(An) 



Thus 



D(T(An)\\M(An))- 



lim inf I(r,/Un) 

n^oo H(r,ftri)>e 

< inf D{T{An)\\n{An)) 



= 5*(e), 

and we proved (11). 




Large deviations of Hellinger distance 



537 



References 

[1] Beirlant, J., Devroye, L., Gyorfi, L. and Vajda, L (2001): Large deviations of diver- 
gence measures on partitions. J. Statist. Planning Infer. 93, 1-16. 

[2] Csiszar, I. (1967). Information-type measures of divergence of probability distribu- 
tions and indirect observations. Studia Sci. Math. Hungar.^ 2, pp. 299-318. 

[3] Dembo, A. and Zeitouni, O. (1992). Large Deviations Techniques and Applications. 
Jones and Bartlett Publishers. 

[4] Devroye, L. and Gyorfi, L. (1985). Nonparametric Density Estimation: the L\ View. 
Wiley, New York. 

[5] Gyorfi, L. and van der Meulen, E. C. (1991). A consistent goodness-of-fit test based 
on the total variation distance. In: Nonparametric Functional Estimation and Related 
Topics (G. Roussas, ed.), Kluwer, Boston, 631-646. 

[6] Kallenberg, W. C. M.(1985). On moderate and large deviations in multinomial dis- 
tributions. Annals of Statistics, 13, 1554-1580. 

[7] Quine, M.P. and Robinson, J. (1985). Efficiencies of chi-square and likelihood ratio 
goodness-of-fit tests. Ann. Statist., 13, 727-742. 

[8] Sanov, I. N. (1957). On the probability of large deviations of random variables. Mat. 
Sb., 42, pp. 11-44 (English translation in Sel. Transl. Math. Statist. Prob., 1, (1961), 
pp.213-244). 

Laszlo Gyorfi 

Department of Computer Science and Information Theory, Technical University 
of Budapest, 1521 Stoczek u. 2, Budapest, Hungary, 
e-mail: gyorfi@szit.bme.hu 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Estimation of the Offspring Mean for a General 
Class of Size-Dependent Branching Processes. 
Application to Quantitative Polymerase Chain 
Reaction 



Nadia Lalam and Christine Jacob 



ABSTRACT: We first address the problem of estimating the offspring mean 
for a general class of size-dependent branching processes. Then we apply our results 
to the particular setting of Quantitative Polymerase Chain Reaction. 

1. Estimation in a general size-dependent setting 

We consider a single-type supercritical or near-critical size-dependent branching 
process {Nn}n such that the offspring mean m{Nn) converges to a limit m > 1 
with a rate of convergence of order as the population size Nn grows to oo, and 
the offspring variance a‘^{Nn) converges to at the rate where — 1 < ^3 < 1. 
We assume that the offspring mean m{N) = m-\- -h o{N~^) depends on an 

unknown asymptotically identifiable parameter 9 q that belongs either to the limit 
model { 6 q = m) or to the transient model { 6 q = p). When 6 q = m, m{N) — m is a 
nuisance part asymptotically negligible relatively to the identifiable part m; when 
00 = /i, we assume that m and a are known and m{N) — {m-\- pN~^) is the nui- 
sance part asymptotically negligible relatively to the identifiable part m -h pN~^. 
For ease of presentation, we will denote m{N) by rnsQ^VQ^N) where 0 q is the true 
identifiable parameter and represents the nuisance term. 

We estimate 0 q on the non-extinction set of the process from the observations 
{Nh, . • . , Nn}^ by using the conditional least-squares method weighted by {N~ 2 i}n- 
Let 0n,7,i/ = arg min^i J 2 k=h-j-i ~ where u has a given 

value. We study the asymptotic properties of {0n,7,i^}n according to 7, with either 
h or n — h remaining constant as n ^ 00. The main assumptions to get the consis- 
tency concern the asymptotic behavior of the process and namely the existence of 
a deterministic sequence {a^}„ such that lim^^oo W for some random 

variable W. To get the convergence rate of the estimator, we assume mainly a 
condition of the Lindeberg type. Let = 0 when 6 q = m and = a when 
00 = Our results are given in the following theorem: 



Theorem 1.1. a) Strong consistency: lim^_,oo 0,^^^^^^ Oq; 

b) Rate of convergence: there exist stochastic sequences {Pn, 7 ,i/}n {Qn,'y,iy}n 

such that 0 o) — -^n, 7 ,i//^n, 7 ,i^; where lim ^^ — >.qq Pn^'y^i/ — P with 

£'[g(^^P)]^_^Jexp (— andlim^_,oo Qn.'y.v 
The rate of convergence is 






-1 

n,7 



E 2(l-a.)-7 
^k-1 



\_k=h-\-l 



n 

E l+/3+2(l-aO-27 
^k-1 

^ k=h-\-l 




540 



Nadia Lalam and Christine Jacob 



and the best rate is attained for 7 = 1 + 

We prove the consistency by using the minimum contrast method and a strong 
law of large numbers for martingales [2]. We get the asymptotic distribution of 
the estimator by applying a Taylor approximation to the conditional least-squares 
contrast and by using Rahimov’s central limit theorem for random sums. For more 
details, see [4]. Notice that to obtain the consistency, it is sufficient to assume that 
the offspring variance satisfies (t‘^{N) < and the normalizing sequence {a^jn 
may be stochastic (e.g. an = Nn^ for all n). 

2. Application to Quantitative Polymerase Chain Reaction 
(QPCR) 

We apply our theoretical results to the particular setting of the QPCR consisting 
in the in vitro amplification of a DNA molecules population and which aims at 
quantifying the initial amount Nq of the population. The PCR is largely used in 
molecular biology since it enables to detect low abundance of DNA and it has 
many applications (for more details, see [ 1 ]). 

In the PCR setting, the current quantification method is based on inadequate 
mathematical models and needs many amplification trajectories to estimate the 
efficiency in order to quantify Nq, where the efficiency is the probability that a 
DNA molecule will be duplicated. 

The branching process theory is naturally used to model the PCR and recently 
a size-dependent branching process has been developed by Jagers and Klebaner 
[3]. We propose a new modelling of a PCR amplification trajectory relying on 
size-dependent branching processes. Our modelling is based on the concept of sat- 
uration and generalizes the efficiency model proposed in [3]. 

We estimate the reaction efficiency from successive observations of a single tra- 
jectory and we study the properties of the estimators at finite distances using 
simulations and real-time PCR data. 

References 

[1] Ferre F., 1998, Gene quantification. Ed. Ferre F., Birkhauser, New- York. 

[2] Hall P. & Heyde C. C., 1980, Martingale limit theory and its application, Probability 
and Mathematical Statistics, Academic Press, New York. 

[3] Jagers P. & Klebaner F. C., 2003, Random variation and concentration effects in 
PCR, J. Theoret Biol, 224, 299-304. 

[4] Lalam N. Sz Jacob C., 2004, Estimation of the offspring mean in a supercritical or 
near-critical size-dependent branching process. Adv. AppL Prob. (accepted). 

[5] Lalam N., Jacob C. &: Jagers P., 2004, Modelling the PCR amplification process by a 
size-dependent branching process and estimation of the efficiency. Adv. AppL Prob. 
(accepted). 

Nadia Lalam 

EURANDOM, P.O. Box 513, 5600 MB Eindhoven, The Netherlands 
lalam@eur andom .tue.nl 

Christine Jacob 

INRA, Laboratoire de Biometrie, 78352 Jouy-en-Josas Cedex, France 
cj ©banian . j ouy. inr a. fr 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



Decidability of Simple Brick Codes 

Malgorzata Moczurad and Wlodzimierz Moczurad 



ABSTRACT: Bricks are polyominoes with labelled cells. The problem whe- 
ther a given set of bricks is a code, is undecidable in general It is open for two- 
element sets. Here we consider sets consisting of square bricks only. We show that 
in this setting, the codicity of small sets (two bricks) is decidable, but 15 bricks are 
enough to make the problem undecidable. Thus the frontier between decidability 
and undecidability lies somewhere between these two numbers. Additionally we 
show that the undecidability frontier could be improved to 13 if the Post Corre- 
spondence Problem with three pairs should prove undecidable. 



1. Introduction 

Let A be a finite alphabet of labels. A brick is a partial mapping k : ^ A, 

where dom k is finite and connected. It can be viewed as a polyomino with its cells 
labelled with the symbols of A. The set of all bricks over A is denoted by A^. 
Given a set of bricks X C A^, the set of all bricks tilable with (translated copies 
of) the elements of X is denoted by X^. Note that we do not allow rotations of 
bricks. X C A^ is a brick code, if every element of X^ admits exactly one tiling 
with the elements of X. The effective alphabet of X C A^ is the set of all symbols 
that appear on bricks in X, i.e., UfcGX Mdom/c). 

The problem whether a given set of bricks (even polyominoes) is a code, is 
undecidable in general [1, 5]. The problem is open for two-element sets. Here we 
consider sets consisting of square bricks only. We show that in this setting, the 
codicity of small sets (two bricks) is decidable, but 15 bricks are enough to make the 
problem undecidable. Note that apart from the single-label case, the decidability 
of sets of squares does not depend on the size of the effective alphabet, since a 
larger alphabet can be “simulated” with two symbols at the expense of the size of 
the squares. 

2. Decidability frontier 

Consider sets consisting of just two bricks, each of them being a square (in the 
geometrical sense). We show that there exists a simple algorithm to verify whether 
a set of this kind is a brick code. 

Proposition 2.1. Let X = {k, 1} C A^, where domfc and doml are squares, k 
Then X is not a brick code iff k and I have a common rectangular tiler, i.e., there 
exists a rectangle t G A^ such that k,l G 

Proposition 2.2. If k, I G A^ have a common rectangular tiler, then they have a 
common square tiler. 

Corollary 2.3. Let X = {k,l} C A^, where domk and dom/ are squares. It is 
decidable whether X is a brick code. 




542 



Malgorzata Moczurad and Wlodzimierz Moczurad 



A Thue system can be reduced to a brick code problem with square bricks. 
There exists a non-erasing Thue system with an undecidable word problem over a 
two-letter alphabet with just three relations. This is the smallest example known 
to us, due to Matiyasevich [4]. We encode this system, including two arbitrary 
input words v and w, in a set Xs,u,v containing 15 square bricks. 

Proposition 2.4. Let (S,*?) be a Thue system and let u^v G S*. Let Xs^u.v be 
the set constructed as described above. The following equivalence holds: u=s 'Iff 
Xs,u,v is not a brick code. 

Corollary 2.5. The codicity problem for sets containing 15 or more square bricks 
is undecidable. 

The question whether a solution exists for a given Post system S is undecid- 
able in general. The problem is open for 3 < |5| < 6, cf. [2]. Using a reduction 
similar to the Thue-to-brick one and starting with a PCP with three pairs, we 
obtain a set of 13 square bricks. 

Corollary 2.6. If PCP with three pairs is undecidable, then the codicity problem 
for sets containing 13 square bricks is undecidable. 

3. Conclusions 

(i) If the effective alphabet is trivial, the problem is trivially decidable, (ii) If the ef- 
fective alphabet contains at least two symbols, we have decidability for two-element 
sets and undecidability for sets with at least 15 elements, (iii) The undecidability 
frontier could be improved to 13 if PCP with three pairs should prove undecidable. 
We are of course interested in finding the exact frontier between decidability and 
undecidability (cf. [3]). 



References 

[1] P. Aigrain, D. Beauquier; Polyomino tilings, cellular automata and codicity. Theoret. 
Comp. Sci. 147 (1995) 165-180. 

[2] V. Halava, T. Harju, M. Hirvensalo: Binary (generalized) Post Correspondence Prob- 
lem. Theoret. Comp. Sci. 276 (2002) 183-204. 

[3] M. Margenstern: Frontiers between decidability and undecidability: a survey. Theo- 
ret. Comp. Sci. 231 (2000) 217-251. 

f4l Yu. Matiyasevich: Word problem for Thue systems with a few relations. LNCS 909 
(1995) 39-53. 

[5] W. Moczurad: Algebraic and algorithmic properties of brick codes. Doctoral thesis, 
Jagiellonian University (1999). 

[6] W. Moczurad: Brick codes: families, properties, relations. Intern. J. Comp. Math. 
74 (2000) 133-150. 



Malgorzata Moczurad and Wlodzimierz Moczurad 

Inst, of Computer Science, Jagiellonian Univ., Nawojki 11, 30-072 Krakow, Poland 
{mmoczurad,wkm}@ii.uj. edu.pl 




Trends in Mathematics, © 2004 Birkhauser Verlag Basel/Switzerland 



A Constrained Version of Sauer’s Lemma 

Joel Ratsaby 



ABSTRACT: We generalize Sauer’s Lemma to finite VC-dimension classes 
"K of binary-valued functions on [n] = {1, . . . , n} which have a margin of at least 
N on every element in a sample S C [n] of cardinality I, where the margin ph{x) 
of h e % on a point x E [n] is defined as the largest non-negative integer a such 
that h is constant on the interval Ia{x) = [x — a^x a]. 



1. Introduction 

Estimation of the complexity of classes of binary- valued functions has been behind 
much of recent developments in the of theory learning. In a seminal paper Vapnik 
& Chervonenkis [9] applied the law of large numbers uniformly over an infinite 
class y of binary functions, i.e., indicator functions of sets A in a general domain 
X, and showed that the complexity of the problem of learning pattern recognition 
from samples of n randomly drawn examples can be characterized in terms of a 
combinatorial complexity of 9^. 

This complexity, known as the growth function of T and denoted by 
counts the maximal number of dichotomies, i.e., binary vectors corresponding to 
the restriction of functions / E T on a finite subset S' C X of cardinality n, 
where the maximum runs over all such S. The Vapnik-Chervonenkis dimension of 
T, denoted as VC (3^), plays a crucial role in controlling the rate in which 
increases with n. Such binary vectors may be viewed as binary- valued functions 
on a finite domain [n] = {1, . . . ,n} and hence form a finite class 3C of the same 
VC-dimension as S'. 

In this paper we consider finite VC-dimension classes S{(S) of binary-valued 
functions on [n] which satisfy the constraint of having a large margin on any 
one set C X where the margin ph{^) of h G Sf(5) on a point x E [n] is 
defined as the largest non-negative integer a such that h is constant on the interval 
Ia{x) = [x — a, X a]. We denote by ps{h) = min^^^^ Ph{^)’ The main result of 
the paper is an estimate on the cardinality of Sf(5). 

Part of the motivation behind our work arises from results obtained in re- 
cent years which show that learning classes of real-valued functions which are 
constrained to have a large margin on a training sample yields more-accurate hy- 
potheses. There has thus been significant interest in learning-algorithms which 
maximize the sample margin as for instance in support vector machines (see Vap- 
nik [8], Cristianini & Shawe-Taylor [3]). We note in passing that samples on which 
the target function (the one to be learnt) has a large margin are therefore of 
considerable information worth to a learner and estimates of the complexity of 
sets of such samples have been recently obtained in Ratsaby [6] with an explicit 
dependence on the margin parameter and sample size. 

The current paper extends Sauer’s result on the cardinality of finite VC- 
dimension classes 3i (see Lemma 2.1 below) to classes 9f(5) of functions with the 




544 



Joel Ratsaby 



above constraint. While we anticipate interesting learning-theoretic consequences 
emerging from our constrained version of Sauer’s result such investigation however 
is beyond the scope of this paper. 

We start by introducing some needed notation. 



2. Some notations, definitions and existing results 

Let 1{E) denote the indicator function which equals 1 if the expression E is true 
and 0 otherwise. Let F be a class of functions / : [n] ^ {0, 1}. For a set A = 
{ai, . . . , ak} C [n] denote by /|^ — [/(ui), . . . , f{ak)]. F is said to shatter A if 

\{flA:f^F}\=2\ 

The Vapnik-Chervonenkis dimension of F, denoted as VC{F), is defined as the 
cardinality of the largest set shattered by F. 

Sauer [4] obtained the following result: 

Lemma 2.1. [4] If the VC-dimension of F is d then 




We note that the bound is tight as for all d, n > 1 there exist classes F C 2^'^^ 
of VC-dimension d which achieve the equality. 

Consider the following definition of functional margin^ which naturally suits 
binary-valued functions. 

Definition 1. The margin /Jif{x) of f ^ F on an element x € [n] is the largest 
non-negative integer a such that f has a constant value of either 0 or 1 on the 
interval set Ia{x) = {max{l, x — a},. , min{a: -f- a, n}}. 

The sample-margin ias{f) of / on a subset S C [n] is defined as 

= min At/ (a;). 

More generally, this definition applies also to classes on other domains X if there 
is a linear ordering on X. 



3. Technical results 

We start with the main result of the paper: 

Theorem 3.1. Let "K be a class of binary-valued functions h on [n] having V C ( J£) < 
d. Let S C [n] have cardinality I > 1 and consider the subclass ‘K{S) C TC which 
consists of all functions ft G with a margin ///^(x) > N only on elements x E S. 
Then 

where is defined in Lemma 3.3 below. 



^For other definitions of margin see for instance [3]. 




A constrained version of Sauer’s Lemma 



545 



Proof. The condition > N implies only two types of functions h are allowed, 
those which take either a constant-0 value or a constant- 1 value over all elements in 
the interval /iv(x). The condition /2h{^) < A' implies that any function is possible 
except those taking a constant-0 or a constant-1 value over Hence clearly 

the first condition is significantly more restrictive. 

Since we seek an upper bound on |M(*S')| then we consider among all sets of 
[n] of cardinality / a set 5 with the least restrictive constraint, namely, with as 
few elements x as possible for which > N. This is achieved by a maximally- 

packed set 5* C [n] of I elements, for instance S* = {1, It yields a minimal- 
size region i? = {l,...,Z + iV + l} on which every candidate h must take either a 
constant-0 or constant- 1 value, i.e., have a margin larger than N for every x e S*. 
This leaves a maximal-size region [n] \Ron which the less stringent constraint of 
having a margin no larger than N must hold^ for each function h. By Lemma 3.4 
(see below), there are no more than {n-l — N-1) functions in IK that satisfy 

the latter. Hence for any S C [n] of cardinality \S\ = /, |IK(S')| < 

□ 

We proceed by starting with a few auxiliary lemmas. 



Lemma 3.2. For N > 0, n > 0, 0 < m < n, let number of standard 

(one- dimensional) ordered partitions of a nonnegative integer n into m parts each 
no larger than N. Then 



'Wm.Nin) = < 



[ I(n = 0) 

n 

E 

i=0,iV+l,2(iV+l),... 



if m = 0 

( 



iy/(JV+l)/ m 

’ W(^ + l) 



n ■ 



i + m 
n — i 



if m>l. 



Remark 3.1. While our interest is in [n] = {!,..., n}, we allow Wm,N{'n) to be 
defined on n = 0 for use by Lemma 3.3. 



Proof The generating function (g.f.) for Wm.Ni'o) is 



= 'Y^Wm,N{n)x^ 



n>0 



1 - 
1 — X 



When m = 0 the only non- zero coefficient is of x° and it equals 1 so = 

( \ m 

T^j . Then 

m / X 

which generates the sequence tiv(n) = (A+1) = 0). 

Similarly, for m > 1, it is easy to show S'(x) generates s(n) = The 

product W{x) = T{x)S{x) generates their convolution ^Ar(n) ^s(n), namely. 



'^m,N (p) — 



E 



i=0,JV+l,2(JV+l),... 



^ ’ W(^+i) 



n ■ 



i + m — 1 
n — i 



.□ 



^Note, it is possible to have a constant value b (where b G {0, 1}) on {1, + AT + 1} and a 
value of 1 — 6 on the {I N + element, which is necessary for h in order to have dh{x) > N 
for X G {1, . . . , Z} and d^(x) < N for x G {/ + 1, • . . , n}. 




546 



Joel Ratsaby 



Remark 3.2. By an alternate proof one obtains a slightly simpler form of 



over m> 1. 



We have two additional lemmas. 



Lemma 3.3. Let the integer 1 < N < n and consider the class F consisting of all 
binary-valued functions f on [n] which take the value 1 on no more than r < n 
elements of [n] and whose margin on any element x G [n] satisfies /i/(x) < N. 
Then 

r n 

1^1 ^ E E n-k-,m,N)= (n) 



^ Wm-l,2N{i) ^ Wm-l,2N{j) , 

z=(fe— m— iV+l)_(. j={n—k—m—N-\-l)^ J 

(a)_}_ = max{a,0} and 'Is defined in Lemma 3.2. 



Proof Consider the integer pair [fc, n — fc], where n > 1 and 0 < fc < n. A 
two-dimensional ordered m-partition of [fc, n — A:] is an ordered partition into m 
two-dimensional parts, [aj^bj] where 0 < aj^bj < n but not both are zero and 
where Yiy=i[^jy^j] = [fc,^ — fc]. For instance, [2,1] = [0,l] + [2,0] = [1, 1] + [1,0] = 
[2,0] -j- [0,1] are three partitions of [2,1] into two parts (for more examples see 
Andrews [1]). 

Suppose we add the constraint that only ai or bm may be zero while all 
remaining aj^bk > 1, 2 < j < m, 1 < fc < m — 1. Denote any partition that 
satisfies this as valid. For instance, let fc = 2, m = 3 then the m-partitions of 
[k,n- k] are: {[0,l][l,l][l,n - 4]},{[0,l][l,2][l,n - 5]}, . ..,{[0, l][l,n - 3][1,0]}, 
{[0,2][l,l][l,n-5]}, {[0,2][l,2][l,n-6]}, {[0,2][l,n - 4][1,0]}, . . . , {[0,n- 

3][1, 1][1, 0]}. For [fc,n — fc], let 7n,k be the collection of all valid partitions of 
[fc,n — fc]. 

Let Fk denote all binary functions on [n] which take the value 1 over exactly 
fc elements of [n]. Define the mapping U : Fk 7n,k where for any f ^ Fk the 
partition II(/) is defined by the following procedure: Start from the first element 
of [n], i.e., 1. If / takes the value 1 on it then let ai be the length of the constant 
1-segment, i.e., the set of all elements starting from 1 on which / takes the constant 
value 1. Otherwise if / takes the value 0 let a\ = 0. Then let bi be the length of 
the subsequent 0-segment on which / takes the value 0. Let [ai,6i] be the first 
part of n(/). Next, repeat the following: if there is at least one more element of [n] 
which has not been included in the preceding segment, then let Oj be the length 
of the next 1-segment and bj the length of the subsequent 0-segment. Let [oj.bj], 
j = 1, . . . ,m, be the resulting sequence of parts where m is the total number of 
parts. Only the last part may have a zero valued bm since the function may take 
the value 1 on the last element n of [n] while all other parts, [a^, bj],2 < j < m — 1, 
must have aj^bj > 1. The result is a valid partition of [fc, n — fc] into m parts. 

Clearly, every f G Fk has a unique partition. Therefore II is a bijection. 
Moreover, we may divide into mutually exclusive subsets Vm consisting of all 




A constrained version of Sauer’s Lemma 



547 



valid partitions of [fc, n — k] having exactly m parts, where 1 < m < n. Thus 

\Fk\ = i^-i- 

m=l 

Consider the following constraint on components of parts: 



5 — 



N, 

2A + 1, 



1 = l,j = m 

2 < i < m,l < j <m — 1. 



( 1 ) 



Denote by Vm,N C 7n,k the collection of valid partitions of [fe, n — k] into m parts 
each of which satisfies this constraint. 

Let Fk^N = F n Fk consist of all functions satisfying the margin constraint 
in the statement of the lemma and having exactly k ones. Note that / having a 
margin no larger than N on any element of [n] implies there does not exist a first 
or last segment of length larger than N on which / takes a constant value. It also 
means that all other 0-segments or 1-segments must not have a length larger than 
2N + 1. Hence the parts of H(/) satisfy (1). Hence, for any / G Fk^N, its unique 
valid partition H(/) must be in We therefore have 

n 

= Y. ( 2 ) 

m=l 

By definition of F it follows that 

r 

k=0 

Let us denote by 

c(fe, n-k\m,N) = \Vm,N \ (4) 

the number of valid partitions of [fc, n — k] into exactly m parts whose components 
satisfy (1). In order to determine |F| it therefore suffices to determine c(fc, n — 
fc;m,A). 

We next construct the generating function 

G{tl,t2) = EE (5) 

di >0 0-2 ^0 

For m > 1, 

G { h , t 2 ) = (^? + + ■ ■ • + )(<2 + ^2 + • • • + 

( 6 ) 

where the values of the exponents of all terms in the first and second factors 
represent the possible values for ai and 6i, respectively. The values of the exponents 
in the middle m — 2 factors are for the values of Uj, 2 < j < m — 1 and those 
in the factor before last and last are for am and bm, respectively. Equating this to 
(5) implies the coefficient of equals c{ai,a 2 ]m,N) which we seek. 

The right side of (6) equals 

1-h J l-t2 \ l-t2 






548 



Joel Ratsaby 



Let 



which is the g.f. for 



1 ^-/V+l 

= -r^ 



u{n) 



1 , 

0 , 



0 < n < 
n> N 



— ) generate Wm-i^ 2 N{'^) which is defined in Lemma 
3.2. The product U{x)W{x) is the g.f. for their convolution 



y{n) = u{n) * Wm-i, 2 N{n) = ^ Wm-i, 2 N{i)- 

2=(n— AT)-!- 



So (7) becomes 



y^ 'Wm-l,2N{i)Wm-l,2N{j)tT^^ 

Q^i>Q!2>0 i=(ai—N)^ j = (a2 — N)^ 



q; 1 +m — 1 ^a2 +m — 1 
2 



(8) 



Equating the coefficients of ^^ 2 ^ in (5) and (8) yields 

0:2—771+1 

c{a[,a2;m, N) = E E Wm-l,2N{i)w 771—1, 2 iv(j), 

i=(o^ — m— AT+1)^ j = (q:2— 771 — AT+1)_|_ 

m — 1 < q;'i,q ;2 ^ rn(2N + 1) — 2{N + 1). Substituting k for n — A: for 
combining (2), (3) and (4) yields the result. □ 

The next lemma extends the result of Lemma 3.3 to classes % of finite VC- 
dimension. 



Lemma 3.4. Let n > 1 and 0 < d <n. Let 3i be a class of binary-valued functions 
h on [n] satisfying jj>h{x) < N on any x G [n] and let VC{%) < d. Then 

|!Ki</3f)(n) 

where is defined in Lemma 3.3. 

Proof The proof builds on that of Lemma 7 in Haussler & Long [5] which consid- 
ered generalizations of the FC-dimension and is done by double induction on n 
and d. 

Start with the case d = 0, the bound reduces to |Jf| < 1 since pQ^\n) < 1 
when n > 1. The bound is correct since if |J{| > 1 then it implies there are two 
distinct functions /i, g. Let k G [n] be the element on which they differ. Then the 
singleton {k} is shattered by J{ hence the VC-dimension of is at least 1 which 
contradicts the assumption that d = 0 hence | J{| < 1 and the lemma holds. 

Next, suppose d — n. Consider the class F in Lemma 3.3 with r — n. Such F 
consists of all binary- valued functions / on [n] which satisfy the margin constraint 

M/(^) < N on every x G [n]. By Lemma 3.3, |F| < pn^\n). Clearly by definition, 
C F. Hence |J{| < P^\n) as claimed. 

Next, suppose 0 < d < n. Define tt: !K — > {0, by 7 t(/i) == [h{l ), . . . , h{n— 
1)]. Define a : 7t(!K) {0^1} by a{ui, . . . ,Un-i) = minji; : 3h G TC,h{i) = 

Ui, h{n) — v,l < i < n — 1}. Define A — {h e TC : h{n) — a(/i(l), . . . , h{n — 1))} 
and denote by = Jf \ A. Considering all G TC, if the minimal value h(n) is 1 




A constrained version of Sauer’s Lemma 



549 



then is empty. Otherwise, it is not empty and its members take the value 1 on 

n. 

Make the inductive assumption that the claimed bound holds for all classes Jf 
on any subset of [n] having cardinality n — 1 and satisfying the margin constraint. 
Then we claim the following: 

Claim 1. 

This is proved next: the mapping tt is one-to-one on A and the set 7t{A) has 
VC-dimension no larger than d since any subset of [n] shattered by tt{A) is also 
shattered by A which is in "K and VC{^) < d. Hence by the induction hypothesis 

\n{A)\</3^/\n-l) 

and since tt is one-to-one then \A\ = \7t{A)\. □ 

Next, under the same induction hypothesis, we have: 

Claim 2. 

\A<^\<l3^Jl\{n-l). 

We prove this next: First we show that VC(A^) < d — 1. Let E C [n] be 
shattered by A^ and let |£'| = 1. Note that n ^ E since as noted earlier h{n) = 1 
for all h G A^. For any b G {0, let h e A^ he such that h^E = • • • , ^z]- 

If bi^i = 1 then h{n) = since all functions in A^ take the value 1 on n. If 
= 0 then since A^ is non-empty, there exists a,g E A which satisfies g{i) = h{i), 
1 < i < n - 1 and g{n) = a(/i(l), . . . , /i(n — 1)), the latter being g{n) = 0. It follows 
that E n {n} is shattered by ^K. But by assumption VC[%) < d and n ^ E hence 
\E\ < d — 1. Since E was chosen arbitrarily then VC(A^) < d — 1. The same 
argument as in the proof of Claim 1 applied to A^ using d — 1 to bound its VC- 
dimension, obtains the statement of Claim 2. □ 

From Claims 1 and 2 and recalling the definition of c{k, n — fc; m, N) from 
Lemma 3.3, it follows that 

| 5 {| < - 1 ) + - 1 ) 

d n—1 d—1 n— 1 

= ^ ^ c(fc, n - fc - 1; m, AT) + ^ ^ c(fc, n — k - l]m,N) 

k=0m=l k=0m=l 

d n—1 d n—1 

- I{n<N) + EE c{k, n — A: — 1; m, iV) -h EE c{k — l,n - fc;m, iV) 

k=lm=l k=lm=l 

d n—1 

= I{n<N) + EE (c(fc, n - fc - 1; m, A) + c{k — 1, n - fc; m, N)) (9) 

k=l m=l 

where the indicator I(n < A) enters here since in case A: = 0 the only valid function 
is the constant-0 on [n] with n < N. We now have: 

Claim 3. 

n n—1 

c{k, n — k;m^ N) = (c(A:, n — A: — 1; m. A) + c{k — 1, n — A:; m. A)) . (10) 

m—l m=l 




550 



Joel Ratsaby 



Note that this is a recurrence formula for the number (left hand side of (10)) 
of valid partitions of [A;, n — k] (excluding the case A; = 0) into parts that satisfy 
(1)- 

We prove the claim next: given any such partition tt^ there is exactly one 
of four possible ways that it can be constructed by adding a part to a valid two 
dimensional partition tt^-i of [n — 1] while still satisfying (1). The first two amount 
to starting from a partition tt^-i of [fc — l,n — A;] and: (i) adding the part [1,0] 
algebraically to any existing part in e.g., [x, ?/] + [1, 0] = [x + 1, y], to obtain 
a TTn with [x + l,y] as one of the parts (provided that (1) is still satisfied) which 
yields a total number of parts no larger than n — 1 or (ii) adding [1,0] to 7 t^_i 
as a new last part to obtain a TTn (provided it is still valid) with no more than 
n parts. The remaining two ways amount to starting from a partition TTn-i of 
[A:, n - A: — 1] and acting as before except now adding the part [0, 1] instead of [1, 0] 
either algebraically or as a new first part. There are 

n—1 

c(A:, n — A: — 1; m, iV) 

m=l 

valid partitions of [A:, n — A; — 1] and there are 

n—1 

c{k — 1, n — A:; m, N) 

m=l 

valid partitions of [fc — 1, n — A;], all satisfying (1). Doing the aforementioned 
struct ion to each one of these partitions yields all valid partitions of [fc, n — A;] 
satisfy (1). 

Continuing, the right hand side of (9) becomes 

d n d n 

i(»<")+EE c(fc,n — k'^m^N) = EE c{k,n — fc;m, iV) 

k=lm=l k=0m=l 

which is precisely This completes the induction. □ 

Considering a class of VC-dimension d with the margin constraint as de- 
fined in Lemma 3.4 then is crucial for understanding the effect of the 

margin parameter N on the cardinality of J{. A recent result (cf. [7]) shows that 
exhibits an interesting sharp threshold behavior at a point N = iV*, i.e., 
the number of functions ft E Jf that have a margin ///^(x) < A' on every x G X 
decreases sharply as N becomes less than A'* and increases sharply as N becomes 
larger than A* . 



con- 

that 

□ 



4. Conclusions 

The main result of the paper is a bound on the cardinality of a class of finite 
VC-dimension consisting of binary functions on [n] which have a margin greater 
than A on a set S of cardinality 1. This result generalizes the well known Sauer’s 
Lemma and is analogous to existing bounds on the covering number of classes of 
real-valued functions that have a large-margin on a sample S. 




A constrained version of Sauer’s Lemma 



551 



References 

[1] Andrews G.E., (1998), The Theory of Partitions, Cambridge University Press. 

[2] Anthony M., Bartlett P. L., (1999), Neural Network Learning: Theoretical Founda- 
tions, Cambridge University Press, UK. 

[3] Cristianini N. and Shawe- Taylor J., (2000), An Introduction to Support Vector Ma- 
chines and other Kernel-based learning methods, Cambridge University Press, UK. 

[4] Sauer N., (1972), On the density of families of sets, J. Combinatorial Theory (A), 
Vol. 13, pp. 145-147. 

[5] Haussler D., Long P.M., (1995), A generalization of Sauer’s Lemma, Journal of 
Combinatorial Theory (A), Vol. 71(2), pp. 219-240. 

[6] Ratsaby J., (2004), On the Complexity of Cood Samples for Learning, Proc. Tenth 
International Computing and Combinatorics Conference(COCOON 200^), Jeju Is- 
land, Korea, August 2004, Lecture Notes in Computer Science Vol. 3106, Springer- 
Verlag. 

[7] Ratsaby J., (2004), A Sharp Threshold Result for Finite-VC Classes of Large-Margin 
Functions, Department of Computer Science Technical Report RN/04/06, University 
College London. 

[8] Vapnik V.N., (1998), Statistical Learning Theory, Wiley.. 

[9] Vapnik V. N and Chervonenkis A. Ya. (1971), On the uniform convergence of relative 
frequencies of events to their probabilities. Theoret. Probl. AppL, Vol. 16, p.264-280. 

Joel Ratsaby 

Department of Computer Science 
University College London 
London WCIE 6BT, U.K. 



http://www.cs.ucLac.Uk/staff/J.Ratsaby 
J . Rat saby @cs . ucl .ac.uk 




Index 



g^integrals, 59 
^series, 59, 69 

asymmetric exclusion process, 399 
asymptotics, 203 
average case analysis, 149 

basic hypergeometric series, 59 
beta integrals, 59 
binary search tree, 147, 229 
binary tree, 241 
block count, 441 
Boltzmann equation, 517 
branching processes, 311 
branching processes in random 
environment, 375 
branching random walk, 311 
brick codes, 541 
Burger’s equation, 415 

catalytic branching random walk, 387 
central limit theorem, 73, 149 
CFTP, 175 

Chebyshev polynomials, 37 
chemical kinetics, 517 
coalescent chain, 295 
coding theory, 491 
combinatorial [/-statistics, 73 
combinatorial interpretation, 399 
combinatorical complexity, 543 
combinatorics, 217 
conditional least- squares, 539 
contingency tables, 175 
convergence of martingales, 229 
cycles of fixed content, 187 

data compression, 217 
decidability, 541 

decomposable combinatorial objects, 187 
descents in samples of random variables, 339 
Dirichlet distribution, 295 
distributional fixed point equation, 311 
divisor functions, 69 
dynamical systems, 473 

ECO method, 25 
edge- removal procedure, 267 
Eulerian graphs, 429 
exclusion process, 415 

fading channel, 491 
first order logic, 495 
fragmentation process, 461 

Gallon- Watson forest, 265 
game tree, 163 

generalized hook partitions, 25 
generating functions, 3, 37, 49, 203, 217, 
283, 339 

generation of combinatorial objects, 187 



generation of unlabelled cycles, 187 
geometric distribution, 283 
geometric random variables, 339 
grand averages, 261 
growth constant, 133 

harmonic measure, 445 
Hausdorff dimension, 473 
Bellinger distance, 531 
high order differential equations, 255 
Hoeffding decomposition, 73 
hook partitions, 25 

infinitely divisible distributions, 311 
information visualization, 203 
integer composition, 441 
integer partitions, 25 

labelled trees, 257, 261 

large deviation, 491 

large deviations, 163, 351, 531 

large-margin binary-valued functions, 543 

law of iterated logarithm, 39 

law of large numbers, 149 

learning theory, 543 

lecture hall partitions, 25 

Lempel-Ziv’77, 217 

limit distribution, 255 

limit law, 163 

limit theorems, 265 

Markov chain, 399, 517 
Markov chain algorithm, 429 
matrix inversion, 59 
max fixed point equation, 325 
Mellin analysis, 473 
minimal elements, 507 
minimax tree, 163 
mixing time, 175 
mixtures of distributions, 311 
Move-To-Root, 147 
multi-type branching process, 163 
multiplicative cascades, 351 

necklaces, 187 
normalized cut, 363 

ordered cycle lengths, 39 
ordered trees, 257 

parking function, 81 
partitions, 531 
path coupling, 429 
pentagonal numbers, 69 
performance analysis, 162 
periodic function, 241 
phase transition, 415, 507 
planar graphs, 133 
Poisson distribution, 3 
polyominoes, 541 




554 



probabilistic bin packing, 149 
product form, 517 

quantitative polymerase chain reaction, 539 
Quasi-Powers Theorem, 473 
queueing theory, 309 

random coagulation, 295 
random cutting, 241 
random discrete distribution, 147 
random image, 495 
random structures, 507 
random tree, 255 
random walk, 49, 309, 375, 415 
random walk on groups, 445 
random walk with drift, 445 
randomized algorithm, 163 
rapidly mixing, 429 
recursive algorithm, 163 
recursive algorithms, 162 
recursive trees, 267 
representation of solutions, 311 
restricted permutations, 37 
Rice’s method, 283 
risk theory, 309 

saddle point, 3 

sandpile, 81 

Sanov Theorem, 351 

satisfiability, 507 

second moment, 3 

segmentation, 363 

shifting of the mean, 473 

singularity analysis, 133, 203 

size-dependent branching process, 539 

slit plane, 49 

source coding theory, 200 

spectral clustering, 363 

Spitzer condition, 375 

stable distributions, 311 

stationary distribution, 399 

stochastic fixed point, 325 

stochastic modelling, 539 

Strahler number, 203 

Strassen’s law, 39 

subgraph count, 73 

tail bound, 163 
thermodynamic limit, 415 
threshold function, 495 
threshold phenomenon, 507 
transfer operator, 473 
tree profile, 229 
tree statistics, 267 
tries, 200 

two-dimensional critical Bellman-Harris 
branching process, 387 
two-dimensional limit theorem, 387 

undirected graph, 141 
universal codes, 200 
unlabelled trees, 257 



VC-dimension, 543 

Walsh-Hadamard transform, 162 

water cascading, 325 

weighted branching process, 311, 325 

Yule process, 295 

zero-one law, 495 




Author Index 



Ali Khan, Tamur, 163 
Anisimov, Anatoly V., 199 
Archibald, Margaret, 283 
Auber, David, 203 

Barrera, Javiera, 147 
Bellalouna, Monia, 149 
Bertoin, Jean, 295 
Bloznelis, Mindaugas, 73 
Bratiychuk, Mykola S., 309 

Caliebe, Amke, 311 
Cesaratto, Eda, 473 
Climescu-Haulica, Adriana, 491 
Cori, Robert, 81 
Corteel, Sylvie 3, 15 
Coupier, David, 495 
Creignou, Nadia, 507 

Dartois, Arnaud, 81 
Daude, Herve, 507 
Delest, Maylis, 203 
Desolneux, Agnes, 495 
Domenger, Jean-Philippe, 203 
Duchi, Enrica, 399 
Duchon, Philippe, 203 
Dyakonova, Elena, 375 

Fayolle, Guy, 415, 517 
Fayolle, Julien, 217 
Fedou, Jean-Marc, 203 
Fehrenbach, Johannes, 429 
Fekete, Eric, 229 
Ferrari, Luca, 25 
Frieze, Alan, 95 
Furtlehner, Cyril, 415 

Gimenez, Omer, 133 
Goldschmidt, Christina, 295 
Gnedin, Alexander, 441 
Gyorfi, Laszlo, 531 

Hitczenko, Pawel, 161 
Huang, Hung- Jen, 161 

Jacob, Christine, 539 
Jagers, Peter, 325 
Janson, Svante, 241 
Javanian, Mehri, 255 
Johnson, Jeremy R., 161 

Kijima, Shuji, 175 
Knopfmacher, Arnold, 339 

Lalam, Nadia, 539 



Louchard, Guy, 3 
Lovejoy, Jeremy, 15 

Mairesse, Jean, 445 
Malyshev, Vadim, 517 
Mansour, Toufik, 37 
Manstavicius, Eugenijus, 39 
Marchal, Philippe, 461 
Martinez, Conrado, 187 
Matheus, Frederic, 445 
Matsui, Tomomi, 175 
Micheli, Anne, 257 
Moczurad, Malgorzata, 541 
Moczurad, Wlodzimierz, 541 
Molinero, Xavier, 187 
Morris, Katherine, 261 
Myllari, Tatiana, 266 

Neininger, Ralph, 163 
Nikolopoulos, Stavros D., 141 
Noy, Marc, 133 

Panholzer, Alois, 267 
Papadopoulos, Charis, 141 
Paroissin, Christian, 147 
Pemantle, Robin, 3 
Pinzani, Renzo, 25 
Pirogov, Serguei, 517 
Pittel, Boris, 95 
Prodinger, Helmut, 339 

Ratsaby, Joel, 543 
Reznik, Yuriy A., 199 
Rinaldi, Simone, 25 
Rosier, Uwe, 325 
Rossin, Dominique, 81, 257 
Rouault, Alain, 351 
Rubey, Martin, 49 
Riischendorf, Ludger, 429 

Schaeffer, Gilles, 399 
Schlosser, Michael, 59 
Simon, Klaus, 69 
Souissi, Salma, 149 

Takacs, Christiane, 363 
Topchii, Valentin, 387 

Vahidi-Asl, Mohammad Q., 255 
Vallee, Brigitte, 473 
Vatutin, Vladimir, 375, 387 

Ycart, Bernard, 149, 495 
Yee, Ae Ja, 15 




Trends in 
Mathematics 



Trends in Mathematics is a series devoted to the publications of 
volumes arising from conferences and lecture series focusing on a 
particular topic from any area of mathematics, its aim is to make 
current developments available to the community as rapidly as pos- 
sible v\nthout compromise to quality and to archive these for refe- 
rence. 




Your Specialized 
Publisher in 
Mathematics 

Birkhduser ^ 

For orders originatirtg from all over the ^<or1d 
except USA/Cartada/latm America: 

BirkhauserVeriagAG 

c/o Springer GmbH & Co 

Haberstrasse 7 

0^9126 Heidelberg 

Pax: - h19 76221 7 345 4 229 

e-mail: birkhauserOspnnger.de 

http:/7www.birkhauser.ch 

For orders originating m the 
USA7Canada7latin America: 

Birkhiuser 

333 Meadowiand Parkway 

USA-Secaucus 

NJ 07094-2491 

Fax: -1-1 201 348 4S05 

e-mail: ordersObirkhauser.com 



■ Gardy, D. / Mokkadem, A., both Universite de Versailles-St-Quentin, France 

Mathematics and Computer Science 
Algorithms, Trees, Combinatorics and Probabilities 
2000. 356 pages. Hardcover 
ISBN 3-7643-6430-0 

■ Chauvin, B. / Gardy, D. / Mokkadem, A., all University de Versailles-St-Quentin, Prance / 
Flajolet, P., INRIA Rocquencourt, France 

Mathematics and Computer Science II 
Algorithms, Trees, Combinatorics and Probabilities 

2002. 570 pages. Hardcover 
ISBN 3-7643-6933-7 

■ Buescu, J., Instituto Superior T^cnico, Lisboa, Portugal / da Silva Dias, A.P. / Castro, 
S.B.S.D. / Labouriau, I.S., all Universidade do Porto, Portugal 

Bifurcation, Symmetry and Patterns 

2003. 224 pages. Hardcover 
ISBN 3-7643-7020-3 




